-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
Currently, a call to TO_DATE or TO_TIMESTAMP* UDFs with a Utf8View datatypes fails. After the change that fixes this issue, it should not.
> create table ts_utf8_data(ts varchar(100), format varchar(100)) as values
('2020-09-08 12/00/00+00:00', '%Y-%m-%d %H/%M/%S%#z'),
('2031-01-19T23:33:25+05:00', '%+'),
('08-09-2020 12:00:00+00:00', '%d-%m-%Y %H:%M:%S%#z'),
('1926632005', '%s'),
('2000-01-01T01:01:01+07:00', '%+');
0 row(s) fetched.
Elapsed 0.062 seconds.
> create table ts_utf8view_data as
select arrow_cast(ts, 'Utf8View') as ts, arrow_cast(format, 'Utf8View') as format from ts_utf8_data;
0 row(s) fetched.
Elapsed 0.010 seconds.
> SELECT to_timestamp(t.ts, t.format),
to_timestamp_seconds(t.ts, t.format),
to_timestamp_millis(t.ts, t.format),
to_timestamp_micros(t.ts, t.format),
to_timestamp_nanos(t.ts, t.format)
from ts_utf8view_data as t;
Execution error: to_timestamp function unsupported data type at index 1: Utf8View
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the functions to support DataType::Utf8View directly
Describe alternatives you've considered
No response
Additional context
The typical steps are:
Write some tests showing the function doesn't support Utf8View (see the tests in [string_view.slt](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt) to ensure the arguments are not being cast
Change the Signature of the function to accept Utf8View in addition to Utf8/LargeUtf8
Update the implementation of the function to operate on Utf8View
Example PRs
Update to use an arrow kernel that already supports StringView:
#11787
Change the implementation to support StringView directly:
#11676
Change implementation (option 2):
#11556