Skip to content

Regression: Logical optimizer causes invalid query result with case expression #8942

@sergiimk

Description

@sergiimk

Describe the bug

When logical optimization is enabled datafusion v34 started producing incorrect results.

To Reproduce

Here's the minimal repro case I found so far:

let config = SessionConfig::new();
let runtime = Arc::new(RuntimeEnv::default());
let state = SessionState::new_with_config_rt(config, runtime).with_optimizer_rules(vec![]);
let ctx = SessionContext::new_with_state(state);

let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));

let batch =
    RecordBatch::try_new(schema, vec![Arc::new(array::Int32Array::from(vec![0, 1]))]).unwrap();

let df = ctx.read_batch(batch).unwrap();
df.clone().show().await.unwrap();

// Add `t` column full of nulls
let df = df
    .with_column("t", cast(Expr::Literal(ScalarValue::Null), DataType::Int32))
    .unwrap();
df.clone().show().await.unwrap();

let df = df
    // (case when id = 1 then 10 else t) as t
    .with_column(
        "t",
        when(col("id").eq(lit(1)), lit(10))
            .otherwise(col("t"))
            .unwrap(),
    )
    .unwrap()
    // (case when id = 1 then 10 else t) as t2
    .with_column(
        "t2",
        when(col("id").eq(lit(1)), lit(10))
            .otherwise(col("t"))
            .unwrap(),
    )
    .unwrap();

df.clone().show().await.unwrap();

Code above will show:

+----+----+----+
| id | t  | t2 |
+----+----+----+
| 0  |    |    |
| 1  | 10 | 10 |
+----+----+----+

which is correct.

Now comment out the with_optimizer_rules(vec![]) and you will get a very different result:

+----+---+----+
| id | t | t2 |
+----+---+----+
| 0  |   |    |
| 1  |   | 10 |
+----+---+----+

Note that despite t and t2 having identical expressions, column t is now different.

Perhaps the fact that t column is being replaced with expression that depends on previous value of t is what triggers the issue.

Expected behavior

Logical optimization does not produce incorrect results.

Additional context

This broke in datafusion 34, version 33 worked fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions