Skip to content

Conversation

@chenkovsky
Copy link
Contributor

@chenkovsky chenkovsky commented Aug 30, 2025

Which issue does this PR close?

Rationale for this change

arguments of coalesce are evaluated eagerly.

What changes are included in this PR?

simplify coalesce to case when expr.

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Aug 30, 2025
Copy link
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 1689 to 1690
# due to the reason describe in https://github.com/apache/datafusion/issues/8927,
# the following queries will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test below now works, I think we could move this comment to the test below this one.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

02)--TableScan: t projection=[x, y]
physical_plan
01)ProjectionExec: expr=[coalesce(1, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(1),t.y / t.x), coalesce(2, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(2),t.y / t.x)]
01)ProjectionExec: expr=[CASE WHEN true THEN 1 ELSE CAST(y@1 / x@0 AS Int64) END as coalesce(Int64(1),t.y / t.x), CASE WHEN true THEN 2 ELSE CAST(y@1 / x@0 AS Int64) END as coalesce(Int64(2),t.y / t.x)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be able to be simplifed even more -- CASE WHEN true THEN ... should go to 1

I filed a ticket to track this idea:

));
}

let n = args.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a clever implementation.

However, I worry it may cause problems for comet which uses physical evaluation directly

@comphead or @andygrove do you know if comet uses the COALESCE implementation directly?

@mbutrovich
Copy link
Contributor

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

Comet was actually just thinking about coalesce:

apache/datafusion-comet#2270

Maybe @coderfender has some thoughts about this?

@alamb
Copy link
Contributor

alamb commented Sep 6, 2025

Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any more comments

Comet was actually just thinking about coalesce:

apache/datafusion-comet#2270

Maybe @coderfender has some thoughts about this?

It seems like this PR does the same thing described in the comet PR

Thus it seems like a good thing to merge

Let's wait a while to see if anyone else has comments, otherwise I'll plan to merge this PR

@coderfender
Copy link

coderfender commented Sep 6, 2025

@alamb , @mbutrovich I made changes to comet to fallback to CASE statement to replicate lazy evaluation with coalesce (and then plan to work on this PR). Glad to see that the changes are looking good and once we update DataFusion's version in comet, we should be able to undo my comet changes to leverage DF's coalesce function . I will create a github issue on the comet side for this

@alamb alamb merged commit e5dcc8c into apache:main Sep 8, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 8, 2025

Thanks everyone!

@alamb
Copy link
Contributor

alamb commented Sep 9, 2025

I believe this PR caused the extended tests to fail (as more queries now start woring):

alamb added a commit to alamb/datafusion that referenced this pull request Nov 9, 2025
alamb added a commit that referenced this pull request Nov 10, 2025
…ion (#18567)

Note this targets the branch-51 release branch

## Which issue does this PR close?

- part of #17558
- resolves #17801 in the 51
release branch

## Rationale for this change

- We merged some clever rewrites for `coalesce` and `nvl2` to use `CASE`
which are faster and more correct (👏 @chenkovsky @kosiew )
- However, these rewrites cause subtle schema mismatches in some cases
planning (b/c the CASE simplification nullability logic can't determine
the correct nullability in some cases - see
#17801)
- @pepijnve has some heroic efforts to fix the schema mismatch in
#17813 (comment),
but it is non trivial and I am worried about merging it so close to the
51 release and introducing new edge cases

## What changes are included in this PR?

1. Revert #17357 /
e5dcc8c
3. Revert #17991 /
ea83c26
2. Revert #18191 /
22c4214
2. Cherry-pick 6202254, a test that
reproduces the schema mismatch issue (from
#18536)
3. Cherry-pick 735cacf, a fix for the
benchmarks that regressed due to the revert (from
#17833)
4. Update datafusion-testing (see separate PR here) for extended tests
(see apache/datafusion-testing#15)

## Are these changes tested?

Yes I added a new test

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COALESCE expr in datafusion should perform lazy evaluation of the operands Evaluates COALESCE arguments past first non-NULL value

5 participants