Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Jan 20, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Allow merge tree materialized / alias columns to be exported through part export; This is done in two steps:

  1. Relax the schema compatibility constraints to ignore the column classification (e.g, materialized, alias or default)
  2. Materialize alias / defaults columns during the export

Example:

arthur :) create table mt_alias (id Int64, id_alias ALIAS id, year UInt16) engine=MergeTree() partition by year order by tuple();

CREATE TABLE mt_alias
(
    `id` Int64,
    `id_alias` ALIAS id,
    `year` UInt16
)
ENGINE = MergeTree
PARTITION BY year
ORDER BY tuple()

Query id: 8a1082e5-85da-4985-a2de-b87cd3ad1ad4

Ok.

0 rows in set. Elapsed: 0.031 sec. 

arthur :) 
arthur :) insert into mt_alias values (1, 2020);

INSERT INTO mt_alias FORMAT Values

Query id: cc92c942-67a2-49e3-acc4-8343f8e76c60

Ok.

1 row in set. Elapsed: 0.059 sec. 

arthur :) 
arthur :) alter table mt_alias export part '2020_1_1_0' to table function s3(s3_conn, filename='alias_function', partition_strategy='hive', format=Parquet) partition by year;

ALTER TABLE mt_alias
    (EXPORT PART '2020_1_1_0' TO TABLE FUNCTION s3(s3_conn, filename = 'alias_function', partition_strategy = 'hive', format = Parquet) PARTITION BY year)

Query id: fe90d4c2-a237-4dd7-bd68-9b5c8e71927a

Ok.

0 rows in set. Elapsed: 1.799 sec. 

arthur :) 
arthur :) select * from s3(s3_conn, filename='alias_function/**.parquet');

SELECT *
FROM s3(s3_conn, filename = 'alias_function/**.parquet')

Query id: 7621e6e1-d134-4b81-b89d-17a78cbdbb97

   ┌─id─┬─id_alias─┬─year─┐
1. │  1 │        1 │ 2020 │
   └────┴──────────┴──────┘

1 row in set. Elapsed: 0.057 sec. 

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@arthurpassos
Copy link
Collaborator Author

I will add more details tomorrow

@github-actions
Copy link

Workflow [PR], commit [ed2bb31]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed2bb31baa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +165 to +168
auto defaults_dag = evaluateMissingDefaults(
*current_header,
all_columns,
metadata_snapshot->getColumns(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align export output columns with destination physical schema

This step builds defaults/reordering from all_columns, which includes ALIAS columns from the source. The export sink’s header comes from the destination’s getSampleBlock() (physical-only; see StorageInMemoryMetadata::getSampleBlock()), so if the destination table also defines an ALIAS column (same schema as source), the pipeline will output that alias column and StorageObjectStorageSink::consume() will throw because Block::cloneWithColumns requires the column count to match the physical header. This means EXPORT PART fails for tables that include aliases on the destination. Consider filtering to the destination’s physical columns (or using its sample block) when building the output DAG.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Informative, but not quite accurate. AS of now, object storage tables do not support "special" columns like alias or materialized. So these scenarios can't happen.

@arthurpassos arthurpassos changed the title impl Allow merge tree materialized / alias columns to be exported through part export Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants