From 888e30ee6d0e101f8b6ea765145e2f6effc87820 Mon Sep 17 00:00:00 2001 From: Xuyang Date: Tue, 2 Jun 2026 16:43:41 +0800 Subject: [PATCH 1/2] [FLINK-39820][docs] Update delta join documentation with new features in Flink 2.3 --- docs/content.zh/docs/dev/table/tuning.md | 15 +++++++++++---- docs/content/docs/dev/table/tuning.md | 13 ++++++++++--- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/docs/content.zh/docs/dev/table/tuning.md b/docs/content.zh/docs/dev/table/tuning.md index 73fedeac9daa3..4a777ded0a10d 100644 --- a/docs/content.zh/docs/dev/table/tuning.md +++ b/docs/content.zh/docs/dev/table/tuning.md @@ -445,16 +445,23 @@ SET 'table.optimizer.delta-join.strategy' = 'NONE'; 1. 支持 **INSERT-only** 的表作为源表。 2. 支持不带 **DELETE 操作**的 **CDC** 表作为源表。 -3. 支持源表和 delta join 间包含 **project** 和 **filter** 算子。 +3. 支持源表和 delta join 间包含 **projection** 和 **filter** 算子。 4. Delta join 算子内支持**缓存**。 +5. 支持**级联 delta join**(cascaded delta join)—— 查询中符合条件的 join 节点从源表到结果表被依次转换为 delta join 节点。 +6. 支持 delta join 后接 **lookup join**。 +7. 支持在 delta join 与下游算子之间的 projection 和 filter 中使用**非确定性函数**。 然而,delta join 也存在几个**限制**,包含以下任何条件的作业无法优化为 delta join。 1. 表的**索引键**必须包含在 join 的**等值条件**中 2. 目前仅支持 **INNER JOIN**。 3. **下游节点**必须能够处理**冗余变更**。例如以 **UPSERT 模式**运行、不带 `upsertMaterialize` 的 sink 节点。 -4. 当消费 **CDC 流**时,**join key** 必须是**主键**的一部分。 -5. 当消费 **CDC 流**时,所有 **filter** 必须应用于 **upsert key** 上。 -6. 所有 project 和 filter 都不能包含**非确定性函数**。 +4. 当消费 **CDC 流**时,**join key** 必须是 **upsert key**[1] 的一部分。 +5. 当消费 **CDC 流**时,所有 **filter** 必须应用于 **upsert key**[1] 上。 +6. 源表和 delta join 之间的 projection 或 filter 不能包含**非确定性函数**。 + +{{< hint info >}} +[1] Flink 支持定义表级别的**不可变列**(immutable columns)约束来丰富 upsert key,从而使更多场景能够被优化为 delta join。不可变列约束声明某些列一旦为给定主键设置后便不可修改,不可变列会联合主键,作为一组新的 upsert key 传播给下游。该信息由外部存储系统提供。[Apache Fluss(Incubating)](https://fluss.apache.org/) 已在未来计划支持表级别的不可变列约束。 +{{< /hint >}} {{< top >}} diff --git a/docs/content/docs/dev/table/tuning.md b/docs/content/docs/dev/table/tuning.md index 98ede66bc9c6b..6db71142c1fbb 100644 --- a/docs/content/docs/dev/table/tuning.md +++ b/docs/content/docs/dev/table/tuning.md @@ -431,12 +431,19 @@ Delta joins are continuously evolving, and supports the following features curre 2. Support for **CDC** tables without **DELETE operations** as source tables. 3. Support for **projection** and **filter** operations between the source and the delta join. 4. Support for **caching** within the delta join operator. +5. Support for **cascaded delta joins** — eligible join nodes in a query are sequentially converted into delta join nodes from source to sink. +6. Support for **lookup join** after a delta join. +7. Support for **non-deterministic functions** in projections and filters between the delta join and downstream operators. However, Delta Joins also have several **limitations**. Jobs containing any of the following conditions cannot be optimized into a delta join: 1. The **index key** of the table must be included in the join’s **equivalence conditions**. 2. Only **INNER JOIN** is currently supported. 3. The **downstream operator** must be able to handle **duplicate changes**, such as a sink operating in **UPSERT mode** without `upsertMaterialize`. -4. When consuming a **CDC stream**, the **join key** must be part of the **primary key**. -5. When consuming a **CDC stream**, all **filters** must be applied on the **upsert key**. -6. **Non-deterministic functions** are not allowed in filters or projections. \ No newline at end of file +4. When consuming a **CDC stream**, the **join key** must be part of the **upsert key**[1]. +5. When consuming a **CDC stream**, all **filters** must be applied on the **upsert key**[1]. +6. **Non-deterministic functions** are not allowed in projections or filters between the source and the delta join. + +{{< hint info >}} +[1] Flink supports defining a table-level **immutable columns** constraint to enrich the upsert key, enabling delta join optimization in more scenarios. The immutable columns constraint declares that certain columns, once set for a given primary key, cannot be modified. The immutable columns are combined with the primary key to form a new upsert key that is propagated to downstream operators. This information is provided by the external storage system. [Apache Fluss(Incubating)](https://fluss.apache.org/) is planning to support table-level immutable columns constraint in the future. +{{< /hint >}} \ No newline at end of file From 9ba5f066272b9324265ed2733e2dd34c979a87cb Mon Sep 17 00:00:00 2001 From: Xuyang Date: Fri, 5 Jun 2026 15:24:20 +0800 Subject: [PATCH 2/2] address --- docs/content.zh/docs/dev/table/tuning.md | 4 ++-- docs/content/docs/dev/table/tuning.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/content.zh/docs/dev/table/tuning.md b/docs/content.zh/docs/dev/table/tuning.md index 4a777ded0a10d..8a23468b84be3 100644 --- a/docs/content.zh/docs/dev/table/tuning.md +++ b/docs/content.zh/docs/dev/table/tuning.md @@ -405,7 +405,7 @@ FROM TenantKafka t 该功能默认启用。当满足以下所有条件时, regular join 将自动优化为 delta join。 1. 作业拓扑结构满足优化条件。具体可以查看[支持的功能和限制]({{< ref "docs/dev/table/tuning" >}}#supported-features-and-limitations)。 -2. 源表所在的外部存储系统提供了可供 delta join 快速查询的索引信息。目前 [Apache Fluss(Incubating)](https://fluss.apache.org/blog/fluss-open-source/) 已支持在 Flink 中提供表级别的索引信息,其上的表可作为 delta join 的源表。具体可参考 [Fluss 文档](https://fluss.apache.org/docs/engine-flink/delta-joins/#flink-version-support)。 +2. 源表所在的外部存储系统提供了可供 delta join 快速查询的索引信息。目前 [Apache Fluss (Incubating)](https://fluss.apache.org/blog/fluss-open-source/) 已支持在 Flink 中提供表级别的索引信息,其上的表可作为 delta join 的源表。具体可参考 [Fluss 文档](https://fluss.apache.org/docs/engine-flink/delta-joins/#flink-version-support)。 @@ -461,7 +461,7 @@ SET 'table.optimizer.delta-join.strategy' = 'NONE'; 6. 源表和 delta join 之间的 projection 或 filter 不能包含**非确定性函数**。 {{< hint info >}} -[1] Flink 支持定义表级别的**不可变列**(immutable columns)约束来丰富 upsert key,从而使更多场景能够被优化为 delta join。不可变列约束声明某些列一旦为给定主键设置后便不可修改,不可变列会联合主键,作为一组新的 upsert key 传播给下游。该信息由外部存储系统提供。[Apache Fluss(Incubating)](https://fluss.apache.org/) 已在未来计划支持表级别的不可变列约束。 +[1] Flink 支持定义表级别的**不可变列**(immutable columns)约束来丰富 upsert key,从而使更多场景能够被优化为 delta join。不可变列约束声明某些列一旦为给定主键设置后便不可修改,不可变列会联合主键,作为一组新的 upsert key 传播给下游。该信息由外部存储系统提供。[Apache Fluss (Incubating)](https://fluss.apache.org/) 已在未来计划支持表级别的不可变列约束。 {{< /hint >}} {{< top >}} diff --git a/docs/content/docs/dev/table/tuning.md b/docs/content/docs/dev/table/tuning.md index 6db71142c1fbb..e0954e58a8a66 100644 --- a/docs/content/docs/dev/table/tuning.md +++ b/docs/content/docs/dev/table/tuning.md @@ -395,7 +395,7 @@ To mitigate these challenges, Flink introduces the delta join operator. The key This feature is enabled by default. A regular join will be automatically optimized into a delta join when all the following conditions are met: 1. The sql pattern satisfies the optimization criteria. For details, please refer to [Supported Features and Limitations]({{< ref "docs/dev/table/tuning" >}}#supported-features-and-limitations) -2. The external storage system of the source table provides index information for fast querying for delta joins. Currently, [Apache Fluss(Incubating)](https://fluss.apache.org/blog/fluss-open-source/) has provided index information at the table level for Flink, allowing such tables to be used as source tables for delta joins. Please refer to the [Fluss documentation](https://fluss.apache.org/docs/engine-flink/delta-joins/#flink-version-support) for more details. +2. The external storage system of the source table provides index information for fast querying for delta joins. Currently, [Apache Fluss (Incubating)](https://fluss.apache.org/blog/fluss-open-source/) has provided index information at the table level for Flink, allowing such tables to be used as source tables for delta joins. Please refer to the [Fluss documentation](https://fluss.apache.org/docs/engine-flink/delta-joins/#flink-version-support) for more details. ### Working Principle @@ -445,5 +445,5 @@ However, Delta Joins also have several **limitations**. Jobs containing any of t 6. **Non-deterministic functions** are not allowed in projections or filters between the source and the delta join. {{< hint info >}} -[1] Flink supports defining a table-level **immutable columns** constraint to enrich the upsert key, enabling delta join optimization in more scenarios. The immutable columns constraint declares that certain columns, once set for a given primary key, cannot be modified. The immutable columns are combined with the primary key to form a new upsert key that is propagated to downstream operators. This information is provided by the external storage system. [Apache Fluss(Incubating)](https://fluss.apache.org/) is planning to support table-level immutable columns constraint in the future. +[1] Flink supports defining a table-level **immutable columns** constraint to enrich the upsert key, enabling delta join optimization in more scenarios. The immutable columns constraint declares that certain columns, once set for a given primary key, cannot be modified. The immutable columns are combined with the primary key to form a new upsert key that is propagated to downstream operators. This information is provided by the external storage system. [Apache Fluss (Incubating)](https://fluss.apache.org/) is planning to support table-level immutable columns constraint in the future. {{< /hint >}} \ No newline at end of file