- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Description
Describe the bug
Sometimes the SortPreservingMerge sanity check rejects valid ORDER BY with CASE expressions. This doesn't universally happen, but seems to happen only the columns are from joins (debug output reformatted for readability):
Plan: [
SortPreservingMergeExec: [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]
  SortExec: expr=[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 ELSE 999 END ASC NULLS LAST], preserve_partitioning=[true]
    ProjectionExec: expr=[nspname@3 as nspname, oid@0 as oid, typname@1 as typname, typtype@2 as typtype]
      CoalesceBatchesExec: target_batch_size=8192
        HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(typnamespace@2, oid@0)], projection=[oid@0, typname@1, typtype@3, nspname@5]
          CoalesceBatchesExec: target_batch_size=8192
            FilterExec: Use typtype@3 IN (SET) ([Literal { value: Utf8("b"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("r"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("m"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("e"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, Literal { value: Utf8("d"), field: Field { name: "lit", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }])
              DataSourceExec: partitions=1, partition_sizes=[1]
          RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1
            StreamingTableExec: partition_sizes=1, projection=[oid, nspname]
] does not satisfy order requirements:
    [CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST].
Child-0 order:
    [[CASE WHEN typtype@3 = b OR typtype@3 = e OR typtype@3 = p THEN 0 WHEN typtype@3 = r THEN 1 END ASC NULLS LAST]]
I think this shows that CASE expressions lose their ability to be compared for equality. Projecting the CASE and ordering on the projection does not fail.
I found this using a tool which wraps datafusion around the postgres wire protocol so that I could use our Iceberg data lake from a BI tool which supports Postgres.
To Reproduce
I did some bisect'ing on a query issued by npgsql at startup and found this minimal reproduction:
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype = 'r' THEN 1 ENDI initially thought a missing ELSE clause was the culprit, but it does not change anything:
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
WHERE typ.typtype IN ('b','r','m','e','d')
ORDER BY CASE WHEN typ.typtype IN ('b','e','p') THEN 0 WHEN typ.typtype = 'r' THEN 1 ELSE 999 ENDUsing: https://github.com/watford-ep/datafusion-postgres/tree/fix-revealbi-startup-sql
$ cargo test --test revealbi_minimal_reproExpected behavior
This query executes because it contains a valid ORDER BY which is stable across partitions, with or without the ELSE clause.
Additional context
This succeeds:
SELECT ns.nspname, typ.oid, typ.typname, typ.typtype
    FROM pg_type AS typ JOIN pg_namespace AS ns ON (ns.oid = typ.typnamespace)
    WHERE typ.typtype IN ('b','r','m','e','d')
    ORDER BY CASE WHEN COALESCE(typ.typtype, '') IN ('b','e','p') THEN 0 WHEN COALESCE(typ.typtype, '') = 'r' THEN 1 ENDBut applying the same fix to the original query does not.