Skip to content

Update TO_DATE, TO_TIMESTAMP scalar functions to support LargeUtf8, Utf8View #12928

@Omega359

Description

@Omega359

Is your feature request related to a problem or challenge?

Part of #11752 and #11790

Currently, a call to TO_DATE or TO_TIMESTAMP* UDFs with a Utf8View datatypes fails. After the change that fixes this issue, it should not.

> create table ts_utf8_data(ts varchar(100), format varchar(100)) as values
  ('2020-09-08 12/00/00+00:00', '%Y-%m-%d %H/%M/%S%#z'),
  ('2031-01-19T23:33:25+05:00', '%+'),
  ('08-09-2020 12:00:00+00:00', '%d-%m-%Y %H:%M:%S%#z'),
  ('1926632005', '%s'),
  ('2000-01-01T01:01:01+07:00', '%+');
0 row(s) fetched. 
Elapsed 0.062 seconds.

> create table ts_utf8view_data as
select arrow_cast(ts, 'Utf8View') as ts, arrow_cast(format, 'Utf8View') as format from ts_utf8_data;
0 row(s) fetched. 
Elapsed 0.010 seconds.

> SELECT to_timestamp(t.ts, t.format),
       to_timestamp_seconds(t.ts, t.format),
       to_timestamp_millis(t.ts, t.format),
       to_timestamp_micros(t.ts, t.format),
       to_timestamp_nanos(t.ts, t.format)
       from ts_utf8view_data as t;
Execution error: to_timestamp function unsupported data type at index 1: Utf8View

We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.

Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.

To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.

Describe the solution you'd like

Update the functions to support DataType::Utf8View directly

Describe alternatives you've considered

No response

Additional context

The typical steps are:

Write some tests showing the function doesn't support Utf8View (see the tests in [string_view.slt](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt) to ensure the arguments are not being cast
Change the Signature of the function to accept Utf8View in addition to Utf8/LargeUtf8
Update the implementation of the function to operate on Utf8View

Example PRs

Update to use an arrow kernel that already supports StringView: 

#11787
Change the implementation to support StringView directly:
#11676
Change implementation (option 2):
#11556

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions