Skip to content

Predicates on time travel uses the latest schema on Iceberg #23601

@ebyhr

Description

@ebyhr

A reproducible code snippet using TestIcebergV2:

    @Test
    void testTimeTravelAfterSchemaEvolution()
    {
        String tableName = "test_time_travel_after_schema_evolution" + randomNameSuffix();

        assertUpdate("CREATE TABLE " + tableName + " AS SELECT 1 a, 2 b", 1);
        Table icebergTable = loadTable(tableName);
        long snapshotId = icebergTable.currentSnapshot().snapshotId();

        assertUpdate("ALTER TABLE " + tableName + " DROP COLUMN b");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId, "VALUES (1, 2)");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId + " WHERE a = 1", "VALUES (1, 2)");
        assertQuery("SELECT * FROM " + tableName + " FOR VERSION AS OF " + snapshotId + " WHERE b = 1", "VALUES (1, 2)");
    }

The bottom SELECT statement using a dropped b column in predicates throws an exception:

java.lang.AssertionError: Execution of 'actual' query 20240930_012955_00008_x9hnt failed: SELECT * FROM test_time_travel_after_schema_evolutionbhf2fcwom0 FOR VERSION AS OF 5192999285497503973 WHERE b = 1

	at io.trino.testing.QueryAssertions.assertDistributedQuery(QueryAssertions.java:299)
	at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:187)
	at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:160)
	at io.trino.testing.AbstractTestQueryFramework.assertQuery(AbstractTestQueryFramework.java:350)
	at io.trino.plugin.iceberg.TestIcebergV2.testTimeTravelAfterSchemaEvolution(TestIcebergV2.java:1228)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1491)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:2073)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2035)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
Caused by: io.trino.testing.QueryFailedException: Invalid metadata file for table tpch.test_time_travel_after_schema_evolutionbhf2fcwom0
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:134)
	at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565)
	at io.trino.testing.DistributedQueryRunner.executeWithPlan(DistributedQueryRunner.java:554)
	at io.trino.testing.QueryAssertions.assertDistributedQuery(QueryAssertions.java:290)
	... 10 more
	Suppressed: java.lang.Exception: SQL: SELECT * FROM test_time_travel_after_schema_evolutionbhf2fcwom0 FOR VERSION AS OF 5192999285497503973 WHERE b = 1
		at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:572)
		... 12 more
Caused by: io.trino.spi.TrinoException: Invalid metadata file for table tpch.test_time_travel_after_schema_evolutionbhf2fcwom0
	at io.trino.plugin.iceberg.IcebergExceptions.translateMetadataException(IcebergExceptions.java:51)
	at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:206)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.getNextBatch(ClassLoaderSafeConnectorSplitSource.java:43)
	at io.trino.split.ConnectorAwareSplitSource.getNextBatch(ConnectorAwareSplitSource.java:73)
	at io.trino.split.TracingSplitSource.getNextBatch(TracingSplitSource.java:64)
	at io.trino.split.BufferingSplitSource$GetNextBatch.fetchSplits(BufferingSplitSource.java:130)
	at io.trino.split.BufferingSplitSource$GetNextBatch.fetchNextBatchAsync(BufferingSplitSource.java:112)
	at io.trino.split.BufferingSplitSource.getNextBatch(BufferingSplitSource.java:61)
	at io.trino.split.TracingSplitSource.getNextBatch(TracingSplitSource.java:64)
	at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:247)
	at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:172)
	at io.trino.execution.scheduler.PipelinedQueryScheduler$DistributedStagesScheduler.schedule(PipelinedQueryScheduler.java:1285)
	at io.trino.$gen.Trino_testversion____20240930_012950_71.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'b' in struct: struct<1: a: optional int>
	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
	at org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:45)
	at org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:26)
	at org.apache.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:111)
	at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:181)
	at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:135)
	at org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:347)
	at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:151)
	at org.apache.iceberg.ManifestGroup.lambda$entries$8(ManifestGroup.java:254)
	at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$3(LocalLoadingCache.java:183)
	at com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:297)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1710)
	at com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:293)
	at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:112)
	at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:58)
	at org.apache.iceberg.ManifestGroup.lambda$entries$9(ManifestGroup.java:274)
	at org.apache.iceberg.io.CloseableIterable$5.shouldKeep(CloseableIterable.java:139)
	at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66)
	at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
	at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:64)
	at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
	at org.apache.iceberg.io.CloseableIterator$3.hasNext(CloseableIterator.java:91)
	at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
	at org.apache.iceberg.io.CloseableIterable$ConcatCloseableIterable$ConcatCloseableIterator.hasNext(CloseableIterable.java:247)
	at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatchInternal(IcebergSplitSource.java:268)
	at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:203)
	... 18 more

Similar to #23295, but filed a new issue because SELECT without specific predicates works in this case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingicebergIceberg connector

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions