Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 22, 2024

Which issue does this PR close?

Part #5472

Rationale for this change

We don't have great benchmark coverage for COUNT(DISTINCT <String>) where the lengths of the strings are small. This came up as part of #8849

What changes are included in this PR?

  1. Add two new queries in "extended" clickbench suite: (see Add "Extended" clickbench queries #8861) that cover count distinct for short string columns

Are these changes tested?

I tested them manually

Are there any user-facing changes?

No, this is benchmark only

@alamb alamb added the performance Make DataFusion faster label Jan 22, 2024
@alamb alamb requested a review from Dandandan January 24, 2024 11:46
@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2024

@Dandandan / @andygrove I wonder if you would have time to quickly review this PR (mostly docs + 2 queries) so I can run benchmarks on #8849 against main?

@alamb
Copy link
Contributor Author

alamb commented Jan 25, 2024

Thank you @tustvold

@alamb alamb merged commit 928162f into apache:main Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants