Skip to content

Conversation

@mkmkme
Copy link
Collaborator

@mkmkme mkmkme commented Jan 16, 2026

Fixes #1301. Upstream PR: ClickHouse#94335

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixes an issue when Iceberg columns with dot in names returned NULL as values.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@mkmkme
Copy link
Collaborator Author

mkmkme commented Jan 16, 2026

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

Workflow [PR], commit [15a5d9a]

@mkmkme
Copy link
Collaborator Author

mkmkme commented Jan 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6b0a778de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +155
/// Strip "current_path." prefix to get child name (preserves dots in child names)
std::string_view mapped = it->second;
if (mapped.starts_with(current_path) && mapped.size() > current_path.size()
&& mapped[current_path.size()] == '.')
return mapped.substr(current_path.size() + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize case before prefix stripping

When case_insensitive_column_matching is enabled, node.name is rewritten to the query’s casing in processSubtree, so current_path can differ in case from the Iceberg-mapped name. The starts_with check here is case-sensitive; if the user queries a tuple/struct column with different case, the prefix won’t be stripped and child names get built with the full mapped path (e.g., mystruct.MyStruct.child), which then fails tryGetPositionByName and can yield missing tuple elements or errors. Consider normalizing current_path/mapped with the same case-insensitive rules before comparing.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting input_format_parquet_case_insensitive_column_matching is not applicable to ColumnMapper. It is used when there are differences between what ClickHouse table schema expects and what the parquet file contains -- see https://github.com/Altinity/ClickHouse/blob/antalya-25.8/tests/queries/0_stateless/03285_orc_arrow_parquet_tuple_field_matching.sh, while ColumnMapper is a part of ClickHouse analyzer which is always case-sensitive for identifiers.

Copy link
Collaborator Author

@mkmkme mkmkme Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, this is what this setting affects:

# Int64 on write is `foo.bar`
:) INSERT INTO function file('dots.parquet', 'parquet', '`my.tuple` Tuple(`foo.bar` Int64, `bar.baz` String)') SELECT (1, '2') SETTINGS engine_file_truncate_on_insert = 1;

# Int64 on read is `Foo.bar`, case_insensitive_column_matching = 0, no value returned
:) SELECT `my.tuple.Foo.bar`
FROM file('dots.parquet', 'parquet', '`my.tuple` Tuple(`Foo.bar` Int64, `bar.baz` String)')
SETTINGS input_format_parquet_case_insensitive_column_matching = 0

Query id: 622d67b9-06e0-44a3-9c43-1229cbf4eea4

   ┌─my.tuple.Foo.bar─┐
1. │                0 │
   └──────────────────┘

# Int64 on read is `Foo.bar`, case_insensitive_column_matching = 1, a proper value returned
:) SELECT `my.tuple.Foo.bar`
FROM file('dots.parquet', 'parquet', '`my.tuple` Tuple(`Foo.bar` Int64, `bar.baz` String)')
SETTINGS input_format_parquet_case_insensitive_column_matching = 1

Query id: 8f5f3d83-380e-46d5-97d5-af94d4b5ed17

   ┌─my.tuple.Foo.bar─┐
1. │                1 │
   └──────────────────┘

The correct value here is 1. This is an example of reading it from the current antalya-25.8, not this branch.

Trying to SELECT my.tuple.Foo.bar when schema has my.tuple.foo.bar will always fail, because this setting doesn't affect what you're selecting vs what's in the schema. It affects what's in the schema vs what's in the file.

Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants