Skip to content

DataFusion not using NDV stat #18628

@LiaCastaneda

Description

@LiaCastaneda

Describe the bug

This was brought up before here #15265
From the top of my head my I think we could use these in:

  • Join swapping strategy
  • Optimal number of partitions in Hash Join partitioning where we use Hash Partitioning.
  • Filter pushdown decision: if we have an very low NDV it might not be worth pushing down // computing predicates?

This would likely require further investigation into how other query engines use NDV for optimization decisions 🤔

To Reproduce

No response

Expected behavior

ndv being used

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions