diff --git a/docs/admin-manual/cluster-management/tso.md b/docs/admin-manual/cluster-management/tso.md new file mode 100644 index 0000000000000..744e2832e90bc --- /dev/null +++ b/docs/admin-manual/cluster-management/tso.md @@ -0,0 +1,73 @@ +--- +{ + "title": "Timestamp Oracle (TSO)", + "language": "en", + "description": "Timestamp Oracle (TSO) provides globally monotonic timestamps for Doris." +} +--- + +## Overview + +Timestamp Oracle (TSO) is a service running on the **Master FE** that generates **globally monotonic** 64-bit timestamps. Doris uses TSO as a unified version reference in distributed scenarios, avoiding the correctness risks caused by physical clock skew across nodes. + +Typical use cases include: + +- A unified “transaction version” across multiple tables and nodes. +- Incremental processing / version-based reads using a single global ordering. +- Better observability: a timestamp is easier to interpret than an internal version counter. + +## Timestamp Format + +TSO is a 64-bit integer: + +- High bits: **physical time (milliseconds)** since Unix epoch +- Low bits: **logical counter** for issuing multiple unique timestamps within the same millisecond + +The core guarantee of TSO is **monotonicity**, not being an exact wall clock. + +## Architecture and Lifecycle + +- **Master FE** hosts the `TSOService` daemon. +- FE components (for example, transaction publish and metadata repair flows) obtain timestamps from `Env.getCurrentEnv().getTSOService().getTSO()`. +- The service uses a **time window lease** (window end physical time) to reduce persistence overhead while ensuring monotonicity across master failover. + +### Monotonicity Across Master Failover + +On master switch, the new Master FE replays the persisted window end and calibrates the initial physical time to ensure the first TSO it issues is strictly greater than any TSO issued by the previous master. + +## Configuration + +TSO is controlled by FE configuration items (see [FE Configuration](../config/fe-config.md) for how to set and persist configs): + +- `enable_feature_tso` +- `tso_service_update_interval_ms` +- `max_update_tso_retry_count` +- `max_get_tso_retry_count` +- `tso_service_window_duration_ms` +- `tso_time_offset_debug_mode` (test only) +- `enable_tso_persist_journal` (may affect rollback compatibility) +- `enable_tso_checkpoint_module` (may affect older versions reading newer images) + +## Observability and Debugging + +### FE HTTP API + +You can fetch the current TSO without consuming the logical counter via FE HTTP API: + +- `GET /api/tso` + +See [TSO Action](../open-api/fe-http/tso-action.md) for authentication, response fields, and examples. + +### System Table: `information_schema.rowsets` + +When enabled, Doris records the commit TSO into rowset metadata and exposes it via: + +- `information_schema.rowsets.COMMIT_TSO` + +See [rowsets](../system-tables/information_schema/rowsets.md). + +## FAQ + +### Can I treat TSO as a wall clock? + +No. Although the physical part is in milliseconds, the physical time may be advanced proactively (for example, to handle high logical counter usage), so TSO should be used as a **monotonic version** rather than a precise wall clock. diff --git a/docs/admin-manual/config/fe-config.md b/docs/admin-manual/config/fe-config.md index b9bd59857c349..9bc3a06ae57e0 100644 --- a/docs/admin-manual/config/fe-config.md +++ b/docs/admin-manual/config/fe-config.md @@ -360,6 +360,88 @@ Is it possible to dynamically configure: true Is it a configuration item unique to the Master FE node: false +### TSO (Timestamp Oracle) + +#### `enable_feature_tso` + +Default:false + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +Whether to enable TSO (Timestamp Oracle) related experimental features, such as recording rowset commit TSO and exposing it via system tables. + +#### `tso_service_update_interval_ms` + +Default:50(ms) + +IsMutable:false + +Is it a configuration item unique to the Master FE node: true + +The update interval of the TSO service in milliseconds. The daemon periodically checks clock drift/backward and renews the time window. + +#### `max_update_tso_retry_count` + +Default:3 + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +Maximum retry count when the TSO service updates the global timestamp (for example, when persisting a new window end). + +#### `max_get_tso_retry_count` + +Default:10 + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +Maximum retry count when generating a new TSO. + +#### `tso_service_window_duration_ms` + +Default:5000(ms) + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +The duration of a leased TSO time window in milliseconds. The Master FE persists the window end to reduce persistence frequency while keeping monotonicity across master failover. + +#### `tso_time_offset_debug_mode` + +Default:0(ms) + +IsMutable:true + +Is it a configuration item unique to the Master FE node: false + +Time offset for the TSO service in milliseconds. For test/debug only. + +#### `enable_tso_persist_journal` + +Default:false + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +Whether to persist the TSO window end into edit log. Enabling this may emit new operation codes and may break rollback compatibility with older versions. + +#### `enable_tso_checkpoint_module` + +Default:false + +IsMutable:true + +Is it a configuration item unique to the Master FE node: true + +Whether to include TSO information as a checkpoint image module for faster recovery. Older versions may need to ignore unknown modules when reading newer images. + ### Service #### `query_port` diff --git a/docs/admin-manual/open-api/fe-http/tso-action.md b/docs/admin-manual/open-api/fe-http/tso-action.md new file mode 100644 index 0000000000000..edbd928b35525 --- /dev/null +++ b/docs/admin-manual/open-api/fe-http/tso-action.md @@ -0,0 +1,66 @@ +--- +{ + "title": "TSO Action", + "language": "en", + "description": "Get current TSO (Timestamp Oracle) information from the Master FE." +} +--- + +# TSO Action + +## Request + +`GET /api/tso` + +## Description + +Returns the current TSO (Timestamp Oracle) information from the **Master FE**. + +- This endpoint is **read-only**: it returns the current TSO value **without increasing** it. +- Authentication is required. Use an account with **administrator privileges**. + +## Path parameters + +None. + +## Query parameters + +None. + +## Request body + +None. + +## Response + +On success, the response body has `code = 0` and the `data` field contains: + +| Field | Type | Description | +| --- | --- | --- | +| window_end_physical_time | long | The end physical time (ms) of the current TSO window on the Master FE. | +| current_tso | long | The current composed 64-bit TSO value. | +| current_tso_physical_time | long | The extracted physical time part (ms) from `current_tso`. | +| current_tso_logical_counter | long | The extracted logical counter part from `current_tso`. | + +Example: + +```json +{ + "code": 0, + "msg": "success", + "data": { + "window_end_physical_time": 1625097600000, + "current_tso": 123456789012345678, + "current_tso_physical_time": 1625097600000, + "current_tso_logical_counter": 123 + } +} +``` + +## Errors + +Common error cases include: + +- FE is not ready +- Current FE is not master +- Authentication failure diff --git a/docs/admin-manual/system-tables/information_schema/rowsets.md b/docs/admin-manual/system-tables/information_schema/rowsets.md index c295b76668813..4aaa9b45c6b96 100644 --- a/docs/admin-manual/system-tables/information_schema/rowsets.md +++ b/docs/admin-manual/system-tables/information_schema/rowsets.md @@ -32,4 +32,5 @@ Returns basic information about the Rowset. | DATA_DISK_SIZE | bigint | The storage space for data within the Rowset. | | CREATION_TIME | datetime | The creation time of the Rowset. | | NEWEST_WRITE_TIMESTAMP | datetime | The most recent write time of the Rowset. | -| SCHEMA_VERSION | int | The Schema version number of the table corresponding to the Rowset data. | \ No newline at end of file +| SCHEMA_VERSION | int | The Schema version number of the table corresponding to the Rowset data. | +| COMMIT_TSO | bigint | The commit TSO recorded in the Rowset metadata (64-bit). | diff --git a/docs/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md b/docs/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md index 767b1e034a0ab..54f9c2ca496b8 100644 --- a/docs/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md +++ b/docs/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md @@ -370,6 +370,7 @@ The functionality of creating synchronized materialized views with rollup is lim | enable_mow_light_delete | Whether to enable writing Delete predicate with Delete statements on Unique tables with Mow. If enabled, it will improve the performance of Delete statements, but partial column updates after Delete may result in some data errors. If disabled, it will reduce the performance of Delete statements to ensure correctness. The default value of this property is `false`. This property can only be enabled on Unique Merge-on-Write tables. | | Dynamic Partitioning Related Properties | For dynamic partitioning, refer to [Data Partitioning - Dynamic Partitioning](../../../../table-design/data-partitioning/dynamic-partitioning) | | enable_unique_key_skip_bitmap_column | Whether to enable the [Flexible Column Update feature](../../../../data-operate/update/update-of-unique-model.md#flexible-partial-column-updates) on Unique Merge-on-Write tables. This property can only be enabled on Unique Merge-on-Write tables. | +| enable_tso | Whether to enable TSO-related features for this table (for example, recording Rowset commit TSO and exposing `information_schema.rowsets.COMMIT_TSO`). | ## Access Control Requirements @@ -735,4 +736,4 @@ AS SELECT * FROM t1; ```sql CREATE TABLE t11 LIKE t10; -``` \ No newline at end of file +``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/tso.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/tso.md new file mode 100644 index 0000000000000..c754e4a69d87e --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/cluster-management/tso.md @@ -0,0 +1,73 @@ +--- +{ + "title": "全局时间戳服务(TSO)", + "language": "zh-CN", + "description": "TSO(Timestamp Oracle)为 Doris 提供全局单调递增的时间戳。" +} +--- + +## 概述 + +TSO(Timestamp Oracle)是运行在 **Master FE** 上的服务,用于生成 **全局单调递增** 的 64 位时间戳。Doris 在分布式场景中将 TSO 作为统一的版本基准,从而规避多节点物理时钟偏移带来的正确性风险。 + +典型使用场景包括: + +- 跨表、跨节点的统一“事务版本号”。 +- 基于全局顺序的增量计算 / 分版本读取。 +- 更易观测:时间戳相比内部版本号更具可读性。 + +## 时间戳结构 + +TSO 是一个 64 位整数: + +- 高位:自 Unix 纪元以来的**物理时间(毫秒)** +- 低位:用于同一毫秒内发号的**逻辑计数器** + +TSO 的核心保证是**单调递增**,而不是精确反映物理时钟(wall clock)。 + +## 架构与生命周期 + +- **Master FE** 上运行 `TSOService` 守护线程。 +- FE 内部组件(例如事务发布与元数据修复流程)通过 `Env.getCurrentEnv().getTSOService().getTSO()` 获取时间戳。 +- 服务采用“**时间窗口租约**”(窗口右界物理时间)来降低持久化开销,同时保证切主后的单调性。 + +### Master 切换时的单调性保证 + +当发生切主时,新 Master FE 会回放持久化的窗口右界并执行时间校准,确保新主发出的第一个 TSO 严格大于旧主已经发出的所有 TSO。 + +## 配置项 + +TSO 由 FE 配置项控制(如何配置与持久化请参见 [FE 配置项](../config/fe-config.md)): + +- `enable_feature_tso` +- `tso_service_update_interval_ms` +- `max_update_tso_retry_count` +- `max_get_tso_retry_count` +- `tso_service_window_duration_ms` +- `tso_time_offset_debug_mode`(仅测试/调试) +- `enable_tso_persist_journal`(可能影响回滚兼容性) +- `enable_tso_checkpoint_module`(旧版本读取新镜像可能需忽略未知模块) + +## 可观测与调试 + +### FE HTTP 接口 + +可以通过 FE HTTP 接口在不消耗逻辑计数器的情况下读取当前 TSO 信息: + +- `GET /api/tso` + +参见 [TSO Action](../open-api/fe-http/tso-action.md) 获取鉴权方式、返回字段与示例。 + +### 系统表:`information_schema.rowsets` + +在相关能力开启后,Doris 会将提交时的 commit tso 写入 Rowset 元数据,并通过系统表暴露: + +- `information_schema.rowsets.COMMIT_TSO` + +参见 [rowsets](../system-tables/information_schema/rowsets.md)。 + +## FAQ + +### TSO 能否当作物理时钟(wall clock)使用? + +不能。虽然高位包含毫秒级物理时间,但在某些情况下(例如逻辑计数器使用量较高)物理部分可能会被主动推进。因此,应将 TSO 视为**单调递增的版本**,而不是精确的物理时钟。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/fe-config.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/fe-config.md index 813131a9e91b5..8f345e399ba01 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/fe-config.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/fe-config.md @@ -361,6 +361,88 @@ heartbeat_mgr 中处理心跳事件的线程数。 是否为 Master FE 节点独有的配置项:false +### TSO(Timestamp Oracle) + +#### `enable_feature_tso` + +默认值:false + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +是否启用 TSO(全局时间戳)相关实验能力,例如记录 Rowset 的提交 TSO,并在系统表中暴露相关字段。 + +#### `tso_service_update_interval_ms` + +默认值:50(ms) + +是否可以动态配置:false + +是否为 Master FE 节点独有配置项:true + +TSO 服务的更新间隔(毫秒)。守护线程会周期性检查时钟漂移/回拨,并在需要时续租时间窗口。 + +#### `max_update_tso_retry_count` + +默认值:3 + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +TSO 服务更新全局时间戳(例如持久化新的时间窗口右界)失败时的最大重试次数。 + +#### `max_get_tso_retry_count` + +默认值:10 + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +获取/生成 TSO 失败时的最大重试次数。 + +#### `tso_service_window_duration_ms` + +默认值:5000(ms) + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +TSO 时间窗口时长(毫秒)。Master FE 会持久化窗口右界以降低持久化频率,并保证切主后的单调性。 + +#### `tso_time_offset_debug_mode` + +默认值:0(ms) + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:false + +TSO 服务时间偏移(毫秒),仅用于测试/调试。 + +#### `enable_tso_persist_journal` + +默认值:false + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +是否启用将 TSO 时间窗口右界写入 EditLog。开启后可能会产生新的操作码,回滚到旧版本可能不兼容。 + +#### `enable_tso_checkpoint_module` + +默认值:false + +是否可以动态配置:true + +是否为 Master FE 节点独有配置项:true + +是否启用将 TSO 信息作为 checkpoint 镜像模块参与持久化。开启后镜像中包含新模块,旧版本读取新镜像可能需要忽略未知模块。 + ### 服务 #### `query_port` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/fe-http/tso-action.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/fe-http/tso-action.md new file mode 100644 index 0000000000000..cf8960761eccf --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/open-api/fe-http/tso-action.md @@ -0,0 +1,64 @@ +--- +{ + "title": "TSO Action", + "language": "zh-CN", + "description": "从 Master FE 获取当前 TSO(Timestamp Oracle)信息。" +} +--- + +## Request + +`GET /api/tso` + +## Description + +从 **Master FE** 获取当前 TSO(Timestamp Oracle)信息。 + +- 该接口为**只读**:返回当前 TSO,但**不会递增** TSO 值。 +- 需要鉴权,请使用具有**管理员权限**的账号访问。 + +## Path parameters + +无 + +## Query parameters + +无 + +## Request body + +无 + +## Response + +成功时,返回体 `code = 0`,并在 `data` 中包含: + +| 字段 | 类型 | 含义 | +| --- | --- | --- | +| window_end_physical_time | long | Master FE 当前 TSO 时间窗口的右界物理时间(毫秒)。 | +| current_tso | long | 当前完整的 64 位 TSO 值。 | +| current_tso_physical_time | long | 从 `current_tso` 解析出的物理时间部分(毫秒)。 | +| current_tso_logical_counter | long | 从 `current_tso` 解析出的逻辑计数器部分。 | + +示例: + +```json +{ + "code": 0, + "msg": "success", + "data": { + "window_end_physical_time": 1625097600000, + "current_tso": 123456789012345678, + "current_tso_physical_time": 1625097600000, + "current_tso_logical_counter": 123 + } +} +``` + +## 错误 + +常见错误包括: + +- FE 尚未就绪 +- 当前 FE 不是 Master +- 鉴权失败 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/system-tables/information_schema/rowsets.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/system-tables/information_schema/rowsets.md index 821adff23711a..3ae5d64712e74 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/system-tables/information_schema/rowsets.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/system-tables/information_schema/rowsets.md @@ -32,4 +32,5 @@ | DATA_DISK_SIZE | bigint | Rowset 内数据的存储空间。 | | CREATION_TIME | datetime | Rowset 的创建时间。 | | NEWEST_WRITE_TIMESTAMP | datetime | Rowset 的最近写入时间。 | -| SCHEMA_VERSION | int | Rowset 数据对应的表 Schema 版本号。 | \ No newline at end of file +| SCHEMA_VERSION | int | Rowset 数据对应的表 Schema 版本号。 | +| COMMIT_TSO | bigint | Rowset 元数据中记录的提交 TSO(64 位)。 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md index a5a8c5de85fb7..824758ee8bf40 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE.md @@ -371,6 +371,7 @@ rollup 可以创建的同步物化视图功能有限。已不再推荐使用。 | enable_mow_light_delete | 是否在 Unique 表 Mow 上开启 Delete 语句写 Delete predicate。若开启,会提升 Delete 语句的性能,但 Delete 后进行部分列更新可能会出现部分数据错误的情况。若关闭,会降低 Delete 语句的性能来保证正确性。此属性的默认值为 `false`。此属性只能在 Unique Merge-on-Write 表上开启。 | | 动态分区相关属性 | 动态分区相关参考[数据划分 - 动态分区](../../../../table-design/data-partitioning/dynamic-partitioning) | | enable_unique_key_skip_bitmap_column | 是否在 Unique Merge-on-Write 表上开启[灵活列更新功能](../../../../data-operate/update/update-of-unique-model.md#灵活部分列更新)。此属性只能在 Unique Merge-on-Write 表上开启。 | +| enable_tso | 是否对该表开启 TSO 相关能力(例如记录 Rowset 的提交 TSO,并在 `information_schema.rowsets.COMMIT_TSO` 中暴露)。 | ## 权限控制 执行此 SQL 命令的[用户](../../../../admin-manual/auth/authentication-and-authorization.md)必须至少具有以下[权限](../../../../admin-manual/auth/authentication-and-authorization.md): @@ -734,4 +735,4 @@ AS SELECT * FROM t1 ```sql CREATE TABLE t11 LIKE t10 -``` \ No newline at end of file +``` diff --git a/sidebars.ts b/sidebars.ts index d450b43a68b46..a228c8bcfde29 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -724,6 +724,7 @@ const sidebars: SidebarsConfig = { 'admin-manual/cluster-management/load-balancing', 'admin-manual/cluster-management/time-zone', 'admin-manual/cluster-management/fqdn', + 'admin-manual/cluster-management/tso', ], }, { @@ -983,6 +984,7 @@ const sidebars: SidebarsConfig = { 'admin-manual/open-api/fe-http/meta-info-action-V2', 'admin-manual/open-api/fe-http/debug-point-action', 'admin-manual/open-api/fe-http/statistic-action', + 'admin-manual/open-api/fe-http/tso-action', ], }, {