[Feature] Support columnar-extend storage layout for MAP columns

### Search before asking

- [x] I searched in the [issues](https://github.com/alibaba/paimon-cpp/issues) and found nothing similar.


### Motivation

In time-series / IoT / observability workloads, a common pattern is storing free-schema fields in a `MAP<STRING, T>` column (e.g. `metrics MAP<STRING, DOUBLE>`). The default MAP storage (two KV arrays) provides:

- No per-key columnar access
- No per-key statistics
- No predicate pushdown on individual keys

This makes queries like `SELECT ext_map['usage'] FROM metrics WHERE ext_map['usage'] > 30` scan the entire MAP column — extremely inefficient when only 1–3 keys out of thousands are needed per query.

The [PIP: Columnar-Extend Storage Layout for MAP Columns](https://cwiki.apache.org/confluence/display/PAIMON/PIP-43%3A+Columnar-Extend+Storage+Optimization+for+MAP+Type+in+Paimon) proposes a new `extend` storage layout that stores MAP values in `K` reusable physical columns within a Struct, achieving near-full columnar access with per-key statistics and predicate pushdown — without changing the logical type (`MAP<STRING, T>`).

### Solution

#### Physical Layout

Each `MAP<STRING, T>` column marked with `map-storage-layout = extend` is physically stored as:

```
STRUCT<
  __field_mapping: FixedSizeList<Int32, K>,   -- per-row: which field_id each col holds
  __col_0: T, __col_1: T, ..., __col_{K-1}: T,  -- reusable typed columns
  __overflow: MAP<INT32, T>                    -- rare fallback for rows with > K fields
>
```

File metadata (footer) stores: field name↔id dictionary, field_id→physical column set S, overflow set O, K, and max row width.

#### Write Path

1. **Schema conversion utilities** — Logical MAP → physical Struct schema rewriting; metadata serialization/deserialization; EXTEND column detection via field metadata marker.

2. **`FormatWriter::AddMetadata`** — New virtual method (default no-op) for writing key-value metadata to file footer before `Finish()`. Parquet implementation calls `AddKeyValueMetadata`.

3. **Column allocator** — Streaming per-row slot allocator (Hit/Evict/Retain/Overflow) maintaining `K` physical column assignments across batches within a file. LRU-based eviction. Accumulates file-level statistics (S, O, max row width).

4. **Logical→physical batch converter** — Parses logical MAP, encodes field names to integer IDs (file-level dictionary), invokes allocator per row, assembles physical Struct array.

5. **Writer integration** — Extended DataFileWriter that performs conversion before writing + injects metadata on close. AppendOnlyWriter detects EXTEND columns and routes accordingly. Cross-file K adaptation (P99 of recent max row widths, capped by K_max).

#### Read Path

1. **File metadata parsing** — Parse EXTEND metadata from file footer (dictionary, S, O, K). New `GetFileKeyValueMetadata()` method on `FileBatchReader` with Parquet implementation.

2. **Predicate translation** — Translate logical predicates on MAP keys into conservative OR predicates over physical sub-columns. Requires extending `LeafPredicate` to support nested field paths and updating `PredicateConverter` to emit nested `FieldRef`.

3. **Read planning** — At `SetReadSchema` time: look up which physical columns to read (from S), decide whether `__overflow` is needed (from O), translate predicates, and pass the physical schema + physical predicate down to the inner `FileBatchReader` unchanged.

4. **Batch reconstruction** — After `NextBatch`: read `__field_mapping` per row to identify which column holds which field (fine-grained filter), gather values into logical `MAP<STRING, T>`. Merge overflow when needed. Correctness relies on per-row `__field_mapping`, not on pushdown precision.

5. **Reader integration** — A wrapper reader (implements `FileBatchReader`) sits between the upper layer and the format-level reader. Per-file instance. Compatible with varying K across files. Orthogonal to `DataEvolutionFileReader` (schema evolution).

### Anything else?

_No response_

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support columnar-extend storage layout for MAP columns #342

Search before asking

Motivation

Solution

Physical Layout

Write Path

Read Path

Anything else?

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Support columnar-extend storage layout for MAP columns #342

Description

Search before asking

Motivation

Solution

Physical Layout

Write Path

Read Path

Anything else?

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions