Skip to content

Core, Spark 4.1: Fix querying equality deletes with schema evolution#15268

Merged
aokolnychyi merged 3 commits intoapache:mainfrom
aokolnychyi:fix-equality-delete-schema-evolution
Feb 10, 2026
Merged

Core, Spark 4.1: Fix querying equality deletes with schema evolution#15268
aokolnychyi merged 3 commits intoapache:mainfrom
aokolnychyi:fix-equality-delete-schema-evolution

Conversation

@aokolnychyi
Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi commented Feb 8, 2026

This PR fixes querying tables with equality deletes if identity columns used to create the equality delete file are no longer part of the current or time travel schema.

The following scenario fails without the changes in this PR:

  • Add equality deletes on column status
  • Drop status from the table
  • Query the table expecting that all equality deletes would have been applied correctly

@aokolnychyi aokolnychyi force-pushed the fix-equality-delete-schema-evolution branch from 5d78e6b to 8528e5f Compare February 8, 2026 23:11
@singhpk234
Copy link
Copy Markdown
Contributor

do we want this is in 1.11 ?

@aokolnychyi
Copy link
Copy Markdown
Contributor Author

@singhpk234, possibly nice to include as the use case is pretty basic? Can you help review?

@amogh-jahagirdar amogh-jahagirdar self-requested a review February 9, 2026 18:32
private static final Logger LOG = LoggerFactory.getLogger(BaseReader.class);

private final Table table;
private final Schema tableSchema;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only used tableSchema to resolve equality delete fields. We have to switch to all schemas in table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, have we considered just building a union schema?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nvm, we probably want to lazily do that, since on average we wouldn't expect to have to reference the historical fields.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tricky as we don't know which IDs are used in equality delete files.

// field lookup for serializable tables that assumes fetching historic schemas is expensive
private static class FieldLookup implements Function<Integer, Types.NestedField> {
private final Table table;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop empty line.

// this class is not meant to be exposed beyond the delete file index
private static class EqualityDeleteFile {
private final PartitionSpec spec;
private final Function<Integer, Types.NestedField> fieldLookup;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously used spec.schema() to resolve equality fields. That's not OK.

return field != null ? field : historicSchemaFields().get(id);
}

private Map<Integer, Types.NestedField> historicSchemaFields() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about historic. Maybe previousSchemaFields() or oldSchemaFields?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like historic but I'm also good with previous or prior, not very opinionated here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then historic it is.

@aokolnychyi
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aokolnychyi this LGTM overall suggested some minor suggestions and a linked a discussion you might be interested

Comment on lines +283 to +288
private static Collection<Schema> historicSchemas(Table table) {
return table.schemas().values().stream()
.filter(schema -> schema.schemaId() != table.schema().schemaId())
.collect(Collectors.toList());
}
}
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would now require loading the TableMetadata on executor, which is absolutely fine.

please check, why this could be a potential issue: #14944

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been loading the table metadata on the executor for a while at least for other use cases like metadata tables so we're not quite just doing it "now". I believe the specific concern is that currently, when loading table metadata from executors, executors currently read the table metadata JSON lazily when they need certain parts of table metadata that are not stored in SerializableTable. For use cases like server-side planning, there may not be any credentials to actually read the metadata json directly.

I think that's still ultimately a separate issue and given this change, I believe we'd see this issue specifically when there are equality deletes returned as part of scan planning.

I think we can look at that as needed so that executors can resolve table metadata from different approaches (naive approach, just broadcast the whole serializable table for those cases or more lazily resolve the metadata by issuing load table requests), not just having to read the metadata file.

Since the immediate use cases for scan planning tend to not return equality deletes and also the worst case is failure anyways, I don't think this is something to concern ourselves with for the 1.11 release but it's something we should fix at some point.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add, I would just test our remote planning with this change, make sure things behave as expected. I was planning on doing that for 1.11 release voting anyways.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is something to concern ourselves with for the 1.11 release

I am on the same page :), just posted here as an FYI, since there has been some discussions about it !

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that's why I implemented the happy case when the field is in the current schema. So the only case when we will need to load the extra schemas and read the json file on executors is when we can't find the equality delete fields in the current schema.

What do you think about this as is, @amogh-jahagirdar @singhpk234?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds Great to me ! I think this totally fine since the case we have in mind will not return eq delete (untrusted clients) from server

}
}

// indexes all fields from schemas, preferring field definitions from higher schema IDs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : may be ok to mention this is for cases like type promotion ?

Comment on lines -831 to +859
Types.NestedField field = spec.schema().findField(id);
Types.NestedField field = fieldLookup.apply(id);
Preconditions.checkArgument(field != null, "Cannot find field for ID %s", id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt] what would have been prev behaviour when the field was null in spec.schema() was null ? did we failed later ?

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the new test into a branch without the fix to check this, yes it fails when we try and prune out which equality deletes to apply based on stats because we won't be able to resolve the field in the first place.

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java#L233 fails with a NPE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the test I added previously failed.

Set<Integer> seenSchemaIds = Sets.newHashSet();

for (Schema schema : sortByIdAsc(schemas)) {
if (!seenSchemaIds.contains(schema.schemaId())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it sorted set instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

private static final Logger LOG = LoggerFactory.getLogger(BaseReader.class);

private final Table table;
private final Schema tableSchema;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, have we considered just building a union schema?

private static final Logger LOG = LoggerFactory.getLogger(BaseReader.class);

private final Table table;
private final Schema tableSchema;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nvm, we probably want to lazily do that, since on average we wouldn't expect to have to reference the historical fields.

return field != null ? field : historicSchemaFields().get(id);
}

private Map<Integer, Types.NestedField> historicSchemaFields() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like historic but I'm also good with previous or prior, not very opinionated here

Comment thread api/src/main/java/org/apache/iceberg/Schema.java
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @aokolnychyi !

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aokolnychyi this looks good to me!

@aokolnychyi aokolnychyi merged commit 00df493 into apache:main Feb 10, 2026
33 checks passed
@aokolnychyi
Copy link
Copy Markdown
Contributor Author

Thank you, @singhpk234 @amogh-jahagirdar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants