Skip to content

Conversation

@chenkovsky
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

In arrow(array), it will treat decimal as millisecond. but in datafusion(scalar), decimal will be treated as second.

for int64, datafusion's behavior is same as arrow. only decimal is incorrect.

What changes are included in this PR?

treat decimal as millisecond.

Are these changes tested?

UT

Are there any user-facing changes?

Yes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jun 24, 2025
DataType::Timestamp(time_unit, None),
) => {
let scale_factor = 10_i128.pow(*scale as u32);
let scale_factor = 10_i128.pow(*scale as u32 + 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this depend on the timestamp precision (time unit)? Can you please add tests for other units?

Do we need this special casing at all for casting decimal to timestamp?
or can we end up calling cast_with_options, just like we do for virtually all other type pairs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the underlying arrow kernel may not support casting decimal --> timestamp (I bet this is something spark related) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the underlying arrow kernel may not support casting decimal --> timestamp (I bet this is something spark related) 🤔
@alamb here implements casting decimal --> timestamp
https://github.com/apache/arrow-rs/blob/b6240b32e235d4ca330372e3be31f784ba133252/arrow-cast/src/cast/mod.rs#L4032

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this depend on the timestamp precision (time unit)? Can you please add tests for other units?

Do we need this special casing at all for casting decimal to timestamp? or can we end up calling cast_with_options, just like we do for virtually all other type pairs?

@findepi you are right, it's time unit related. I updated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the underlying arrow kernel may not support casting decimal --> timestamp

Can be!

So how does this work when the value is not constant-folded?
What code ends up executing in the 'normal query' case?

Can the same code be executed also in constant-folding case?

Comment on lines 3425 to 3426
SELECT CAST(CAST(x AS decimal(17,2)) AS timestamp(3)) = CAST(CAST(1 AS decimal(17,2)) AS timestamp(3)) from (values (1)) t(x);
----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting like this will help see why these should be equal. Also it's enough to project the values. (They should be equal, but also they should be particular values).

SELECT CAST(CAST(1   AS decimal(17,2)) AS timestamp(3)) AS a UNION ALL
SELECT CAST(CAST(one AS decimal(17,2)) AS timestamp(3)) AS a FROM (VALUES (1)) t(one);

@chenkovsky
Copy link
Contributor Author

should I also change

let scale_factor = 10_i128.pow(*scale as u32);
?

Comment on lines +3425 to +3429
SELECT CAST(CAST(1 AS decimal(17,2)) AS timestamp(3)) AS a UNION ALL
SELECT CAST(CAST(one AS decimal(17,2)) AS timestamp(3)) AS a FROM (VALUES (1)) t(one);
----
1970-01-01T00:00:00.001
1970-01-01T00:00:00.001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the desired semantics when casting from decimal(p,s) to timestamp(q)?

For example, when casting from decimal(38,3) to timestamp(3) it could feel natural to interpret decimal's whole integer part as seconds-of-epoch, and decimal's fraction as millis-of-second.

With the current implementation, the decimal's fraction seems to be discarded.
(Should we have tests with some non-whole integer decimal values as well)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the semantics should be when casting from decimal --> timestamp. I recommend that we add the tests @findepi suggests and file a follow on ticket to clarify this point

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chenkovsky and @findepi -- from my perspective this PR is an improvement as now the semantics for scalars match the semantics for arrays.

It is not 100% clear to me if this is the "right" behavior, but in my opinion consistent is better than inconsistent

Thanks again for working on this

Comment on lines +3425 to +3429
SELECT CAST(CAST(1 AS decimal(17,2)) AS timestamp(3)) AS a UNION ALL
SELECT CAST(CAST(one AS decimal(17,2)) AS timestamp(3)) AS a FROM (VALUES (1)) t(one);
----
1970-01-01T00:00:00.001
1970-01-01T00:00:00.001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the semantics should be when casting from decimal --> timestamp. I recommend that we add the tests @findepi suggests and file a follow on ticket to clarify this point

@findepi
Copy link
Member

findepi commented Jun 27, 2025

It is not 100% clear to me if this is the "right" behavior, but in my opinion consistent is better than inconsistent

agreed

my second thought before merging is that:
We now have two behaviors. clearly a bug, and fixing it can be viewed as a breaking change (depending on point of view). We have to make this breaking change so it's not contentious.

if we fix the bug now into one direction (whichever), and then reconsider desired semantics concluding we want opposite, it will be another breaking change and may be a point of contention. Given breaking changes should be done very judiciously (for good reasons), i would somewhat prefer to force us to decide on the desired semantics.

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

Given breaking changes should be done very judiciously (for good reasons), i would somewhat prefer to force us to decide on the desired semantics.

Looks like the original was added as part of this PR from @jatin510

maybe they remember

@findepi
Copy link
Member

findepi commented Jun 27, 2025

That PR added ScalarValue cast, but not the runtime cast.
I can't find the code for runtie cast. Do we delegate that to arrow?

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

That PR added ScalarValue cast, but not the runtime cast. I can't find the code for runtie cast. Do we delegate that to arrow?

I am not sure -- we would need to trace the code down

@findepi
Copy link
Member

findepi commented Jun 27, 2025

I think i understand.

Before #15486 we were using Arrow's cast semantics for Decimal to Timestamp conversion. This semantics has unfortunate effect of discarding the fractional part of decimal.

In #15486, a to_timestamp(decimal) was implemented, but only for scalar values / constant-foldable values (#15486 (comment)). To match the cast behavior in scalar tests, the ScalarValue::cast_to_with_options was changed (probably following pre-existing pattern of Float64->timestamp special treatment). This introduced #16531, because the non-scalar code path was not changed.

The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486. For those who -- like me -- don't want decimal fraction to get ignored, the to_timestamp seems like the alternative to use. Further, we should remove any other type-special handling in ScalarValue::cast_to_with_options, because it's likely other discrepancies.

Last but not least, i start to appreciate engines which deliberately don't implement a cast from number to timestamp.

postgres=# SELECT 1::timestamp(3);
ERROR:  cannot cast type integer to timestamp without time zone
LINE 1: SELECT 1::timestamp(3);
trino> SELECT CAST(1 AS timestamp(3));
Query 20250627_192003_00001_zzaqu failed: line 1:8: Cannot cast integer to timestamp(3)

there is no single good way to cast, so supporting this with named functions is better.

@findepi
Copy link
Member

findepi commented Jun 30, 2025

The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486.

@alamb do we have an agreement on the direction proposed above?

@alamb
Copy link
Contributor

alamb commented Jun 30, 2025

The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486.

@alamb do we have an agreement on the direction proposed above?

The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486. For those who -- like me -- don't want decimal fraction to get ignored, the to_timestamp seems like the alternative to use. Further, we should remove any other type-special handling in ScalarValue::cast_to_with_options, because it's likely other discrepancies.

I think this makes sense to me -- ideally we can document this somewhere so we don't churn / rework the semantics again in the future

@jatin510
Copy link
Contributor

jatin510 commented Jun 30, 2025

hey !

When i was working with #14612 I found out that, after changing the default parse_float_as_decimal to true. The test cases for the timestamp started failing.

My initial impression was, not to change the behaviour of the existing test cases with this parse option.
So, I worked on #15486, and passed the tests successfully.

@findepi
Copy link
Member

findepi commented Jul 1, 2025

@jatin510 as mentioned above, the Float64 to timestamp cast is broken too. I filed separate issue for this: #16636

@findepi findepi changed the title fix: The inconsistency between scalar and array on the cast of timestamp fix: The inconsistency between scalar and array on the cast decimal to timestamp Jul 1, 2025
.to_array()?,
(
ScalarValue::Decimal128(Some(decimal_value), _, scale),
DataType::Timestamp(time_unit, None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #16639 i tried to clear the path for more changes here.
If and once that one merged, we should be able to remove this whole code block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16639 is now merged.
please remove the whole let scalar_array = match (self, target_type) { ... } block, replacing it with just self.to_array()?.

@alamb
Copy link
Contributor

alamb commented Jul 14, 2025

What is the status of this PR? Shall we merge it? Or are there outstanding issues to resolve?

@findepi
Copy link
Member

findepi commented Jul 17, 2025

What is the status of this PR? Shall we merge it? Or are there outstanding issues to resolve?

requires an update -- #16539 (comment)

@findepi
Copy link
Member

findepi commented Jul 18, 2025

@alamb PTAL

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @findepi and @chenkovsky

_ => self.to_array()?,
};

let scalar_array = self.to_array()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is pretty nice fixing bugs by deleting code 🏆

@findepi findepi merged commit 28b6e9d into apache:main Jul 18, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different result of decimal to timestamp cast when source value is constant

4 participants