fix(table): reject reserved metadata IDs before reassignment by officialasishkumar · Pull Request #1126 · apache/iceberg-go

officialasishkumar · 2026-05-24T22:55:25Z

Summary

reject user schemas that contain reserved metadata column IDs before schema ID reassignment
validate nested struct, list, and map fields for reserved metadata IDs
keep lineage projection tests focused on appendMissingLineageFields behavior

Testing

go test ./table -run "TestNewMetadataRejectsReservedMetadataColumnIDsBeforeReassignment|TestAppendMissingLineageFieldsAlreadyHasRowID|TestNewMetadata_DuplicateValues" -count=1
go test ./table -count=1

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>

laskoviymishka

Thanks for closing this gap, silently reassigning a user-supplied _row_id was a real footgun, and validating before reassignIDs is the right spot.

Two things I'd tidy before merge:

Validator runs twice. NewMetadataWithUUID calls it pre-reassignment (load-bearing), and checkSchemaCompatibility calls it again on the already-reassigned schema (dead for this path). Pick one home, or leave a comment explaining which gate covers which caller.
Test gaps. The new test only covers top-level reserved IDs, the nested struct/list/map walk that's the bulk of the new code is untested. And the TestProjectionV3SchemaAlreadyHasRowID rename drops the end-to-end Scan.Projection() path, which still matters for tables loaded from catalogs.

Once those are in, happy to take another pass.

laskoviymishka · 2026-05-26T07:40:12Z

 	const defaultValuesMinFormatVersion = 3
 	problems := make([]IncompatibleField, 0)

+	if err := validateNoReservedMetadataColumnIDs(sc); err != nil {


I think this call is effectively dead for the NewMetadata path — by the time checkSchemaCompatibility runs, reassignIDs has already remapped every ID to a fresh non-reserved value, so the validator can't find anything. The only meaningful check is the new one in NewMetadataWithUUID before reassignment.

If the intent is to also catch callers that reach checkSchemaCompatibility without going through reassignment (e.g. MetadataBuilder.AddSchema later in a table's life), I'd move this guard into AddSchema directly and leave checkSchemaCompatibility focused on format-version compatibility. Otherwise we end up with two call sites for the same guard and an unclear invariant about which one is the actual gate. wdyt?

laskoviymishka · 2026-05-26T07:40:12Z

+			if err := validateNoReservedMetadataColumnID(child, name+"."+child.Name); err != nil {
+				return err
+			}
+		}


The three if typ == nil { return nil } arms (StructType / ListType / MapType) aren't reachable — a successful type assertion on field.Type.(type) already discriminates against the nil-interface case, and nothing in the codebase produces a typed-nil *iceberg.StructType here. validateComplexDefault in the same file does the same switch without these guards. I'd drop them.

laskoviymishka · 2026-05-26T07:40:12Z

+	}
+
+	return nil
+}


Small consistency thing: the neighboring validators (validateUnknownTypes, validateComplexTypeDefaults) wrap their error with fmt.Errorf("failed to validate ...: %w", err) at the call site, while this one returns bare. Not a regression (the inline check it replaced also returned bare), but worth wrapping at the checkSchemaCompatibility call site for symmetry if we keep that call site at all.

laskoviymishka · 2026-05-26T07:40:12Z

 	assert.ErrorIs(t, err, iceberg.ErrInvalidSchema)
 	assert.Contains(t, err.Error(), "multiple fields for name foo")
 }
+


Two thoughts on the new test:

The PR description says nested struct/list/map fields are validated, but the test only covers top-level reserved IDs. I'd add at least one case with a struct child or a list element carrying RowIDFieldID so the recursive walk is actually exercised.

Also worth adding a NestedField{ID: iceberg.RowIDFieldID, Name: "user_id"} case — that confirms the error message uses the schema's name, not the canonical constant, which is the point of the name argument in validateNoReservedMetadataColumnID.

laskoviymishka · 2026-05-26T07:40:12Z

+// behavior when a schema already includes reserved lineage fields. New table
+// metadata rejects user schemas with these IDs, but projections can still
+// contain them after the scanner adds row-lineage columns.
+func TestAppendMissingLineageFieldsAlreadyHasRowID(t *testing.T) {


This rename narrows the test from end-to-end (scan.Projection() over real v3 metadata) to a unit test of appendMissingLineageFields in isolation. The motivation is clear — NewMetadata now rejects the old setup — but it leaves the full read-path projection over a schema-with-reserved-IDs uncovered, and existing tables loaded from catalogs can still have those IDs.

I'd keep this unit test and add a sibling that constructs *Scan with a hand-built metadata (or via a path that bypasses the new validator) and exercises Projection() end-to-end. Otherwise we lose the original regression target.

laskoviymishka · 2026-05-26T07:40:12Z

 	}

+	if err := validateNoReservedMetadataColumnIDs(sc); err != nil {
+		return nil, err


Worth a one-line comment here explaining why the check runs before reassignIDs — "user-provided IDs must be validated before reassignment overwrites them with fresh non-reserved IDs". The placement is correct but non-obvious to a future reader.

Also a heads-up: Java's TableMetadata.newTableMetadata doesn't reject reserved IDs here; it silently relies on TypeUtil.assignFreshIds to remap them. We're imposing a stricter precondition than Java. I think that's the right call (silent reassignment of a user-supplied _row_id is a footgun), but worth flagging in case it's intentional vs. accidental divergence.

fix(table): reject reserved metadata IDs before reassignment

c0d62b4

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>

officialasishkumar requested a review from zeroshade as a code owner May 24, 2026 22:55

laskoviymishka requested changes May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(table): reject reserved metadata IDs before reassignment#1126

fix(table): reject reserved metadata IDs before reassignment#1126
officialasishkumar wants to merge 1 commit into
apache:mainfrom
officialasishkumar:fix/1107-reject-reserved-metadata-ids

officialasishkumar commented May 24, 2026

Uh oh!

laskoviymishka left a comment

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

laskoviymishka May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

officialasishkumar commented May 24, 2026

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants