Skip to content

feat: add support for _spec_id metadata column#2695

Open
hsiang-c wants to merge 2 commits into
apache:mainfrom
hsiang-c:meta_col_spec_id
Open

feat: add support for _spec_id metadata column#2695
hsiang-c wants to merge 2 commits into
apache:mainfrom
hsiang-c:meta_col_spec_id

Conversation

@hsiang-c

Copy link
Copy Markdown

Which issue does this PR close?

What changes are included in this PR?

  • If a projection includes _spec_id, which is a constant like the _file metadata column for all rows, add it to RecordBatchTransformerBuilder
  • Fix the partition value of manifest entry. The reason is we have the following partitions spec in crates/iceberg/testdata/example_table_metadata_v2.json and the x column is always 1 for all data files, so the identity partition transform value should be 1.
  "default-spec-id": 0,
  "partition-specs": [
    {
      "spec-id": 0,
      "fields": [
        {"name": "x", "transform": "identity", "source-id": 1, "field-id": 1000}
      ]
    }
  ],

Are these changes tested?

  • Yes, unit tests.

@hsiang-c

hsiang-c commented Jun 22, 2026

Copy link
Copy Markdown
Author

FYI @advancedxy @parthchandra

if task
.project_field_ids()
.contains(&RESERVED_FIELD_ID_SPEC_ID)
&& let Some(partition_spec) = &task.partition_spec

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always add spec_id into the constant map if it's selected/projected.

The task.partition_spec should be present even for unpartitioned table(spec_id = 0).

Comment on lines 96 to +102
partition_spec_id: manifest_file.partition_spec_id,
bound_predicates: bound_predicates.clone(),
snapshot_schema: snapshot_schema.clone(),
delete_file_index: delete_file_index.clone(),
name_mapping: name_mapping.clone(),
case_sensitive,
partition_spec: partition_spec.clone(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the partition spec is already in mainifest_file, like the partition_spec_id. maybe we could just use that and avoid embedding partition_spec into ManifestFileContext

@advancedxy

Copy link
Copy Markdown

@hsiang-c thanks, I think it's in good shape, just two minor comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for projecting metadata columns _pos, _spec_id, and _partition in table scan

2 participants