Skip to content

SortPreservingMerge sanity check rejects valid ORDER BY with CASE expression #18327

@watford-ep

Description

@watford-ep

Describe the bug

Sometimes the SortPreservingMerge sanity check rejects valid ORDER BY with CASE expressions. This doesn't universally happen, but seems to happen only the columns are from joins (debug output reformatted for readability):

Plan: [
SortPreservingMergeExec: [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]
  SortExec: expr=[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 ELSE 999 END ASC NULLS LAST], preserve_partitioning=[true]
    ProjectionExec: expr=[nspname@3 as nspname, oid@0 as oid, typname@1 as typname, typtype@2 as typtype]
      CoalesceBatchesExec: target_batch_size=8192
        HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(typnamespace@2, oid@0)], projection=[oid@0, typname@1, typtype@3, nspname@5]
          CoalesceBatchesExec: target_batch_size=8192
            FilterExec: Use typtype@3 IN (SET) ([Literal { value: Utf8("b"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("r"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("m"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("e"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("d"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }])
              DataSourceExec: partitions=1, partition_sizes=[1]
          RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1
            StreamingTableExec: partition_sizes=1, projection=[oid, nspname]
] does not satisfy order requirements:
    [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST].
Child-0 order:
    [[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]]

I think this shows that CASE expressions lose their ability to be compared for equality. Projecting the CASE and ordering on the projection does not fail.

I found this using a tool which wraps datafusion around the postgres wire protocol so that I could use our Iceberg data lake from a BI tool which supports Postgres.

To Reproduce

I did some bisect'ing on a query issued by npgsql at startup and found this minimal reproduction:

SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype = 'r' THEN 1 END

I initially thought a missing ELSE clause was the culprit, but it does not change anything:

SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype = 'r' THEN 1 ELSE 999 END

Using: https://github.com/watford-ep/datafusion-postgres/tree/fix-revealbi-startup-sql

$ cargo test --test revealbi_minimal_repro

Expected behavior

This query executes because it contains a valid ORDER BY which is stable across partitions, with or without the ELSE clause.

Additional context

This succeeds:

SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
    FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
    WHERE typ.typtype IN ('b','r','m','e','d')
    ORDER BY CASE WHEN COALESCE(typ.typtype, '') IN ('b','e','p') THEN 0 WHEN COALESCE(typ.typtype, '') = 'r' THEN 1 END

But applying the same fix to the original query does not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions