diff --git a/identify-slow-queries.md b/identify-slow-queries.md index 18586fe86c160..a8340e0bbbca8 100644 --- a/identify-slow-queries.md +++ b/identify-slow-queries.md @@ -188,80 +188,148 @@ Usage example: SELECT /*+ WRITE_SLOW_LOG */ count(*) FROM t t1, t t2 WHERE t1.a = t2.b; ``` +## Use `tidb_slow_log_rules` + +[`tidb_slow_log_rules`](/system-variables.md#tidb_slow_log_rules-new-in-v900) is used to define trigger rules for slow query logs, supporting multi-dimensional metric combinations. It is suitable for "targeted sampling" and "problem reproduction" of slow logs, enabling you to filter target statements based on specific metric combinations. + +The triggering behavior of slow query logs depends on the configuration of `tidb_slow_log_rules`: + +- If `tidb_slow_log_rules` is not set, slow query log triggering still relies on [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold) (in milliseconds). +- If `tidb_slow_log_rules` is set, the configured rules take precedence, and [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold) will be ignored. + +For more information about meanings, diagnostic value, and background information of each field, see the [Fields description](#fields-description). + +### Unified rule syntax and type constraints + +- Rule capacity and separation: `SESSION` and `GLOBAL` each support a maximum of 10 rules. A single session can have up to 20 active rules. Rules are separated by `;`. +- Condition format: each condition uses the format `field_name:value`. Multiple conditions within a single rule are separated by `,`. +- Field and scope: field names are case-insensitive (underscores and other characters are preserved). `SESSION` rules do not support `Conn_ID`. Only `GLOBAL` rules support `Conn_ID`. +- Matching semantics: + - Numeric fields are matched using `>=`. String and boolean fields are matched using equality (`=`). + - Matching for `DB` and `Resource_group` is case-insensitive. + - Explicit operators such as `>`, `<`, and `!=` are not supported. + +Type constraints are as follows: + +- Numeric types (`int64`, `uint64`, `float64`) uniformly require `>= 0`. Negative values will result in a parsing error. + - `int64`: the maximum value is `2^63-1`. + - `uint64`: the maximum value is `2^64-1`. + - `float64`: the general upper limit is approximately `1.79e308`. Currently, parsing is done using Go's `ParseFloat`. While `NaN`/`Inf` can be parsed, they might lead to rules that are always true or always false. It is not recommended to use them. +- `bool`: supports `true`/`false`, `1`/`0`, and `t`/`f` (case-insensitive). +- `string`: currently does not support strings containing the separators `,` (condition separator) or `;` (rule separator), even with quotes (single or double). Escaping is not supported. +- Duplicate fields: if the same field is specified multiple times in a single rule, the last occurrence takes effect. + +### Supported fields + +For detailed field descriptions, diagnostic meanings, and background information, see the [field descriptions in `identify-slow-queries`](/identify-slow-queries.md#fields-description). + +Unless otherwise noted, the fields in the following table follow the general matching and type rules described in [Unified rule syntax and type constraints](#unified-rule-syntax-and-type-constraints). This table lists only the currently supported field names, types, units, and a few rule-specific notes. It does not repeat each field's semantic meaning. + +| Field name | Type | Unit | Notes | +| -------------------------------------- | -------- | ------ | ------------------------------ | +| `Conn_ID` | `uint` | count | Supported only in `GLOBAL` rules | +| `Session_alias` | `string` | none | - | +| `DB` | `string` | none | Case-insensitive when matched | +| `Exec_retry_count` | `uint` | count | - | +| `Query_time` | `float` | second | - | +| `Parse_time` | `float` | second | - | +| `Compile_time` | `float` | second | - | +| `Rewrite_time` | `float` | second | - | +| `Optimize_time` | `float` | second | - | +| `Wait_TS` | `float` | second | - | +| `Is_internal` | `bool` | none | - | +| `Digest` | `string` | none | - | +| `Plan_digest` | `string` | none | - | +| `Num_cop_tasks` | `int` | count | - | +| `Mem_max` | `int` | bytes | - | +| `Disk_max` | `int` | bytes | - | +| `Write_sql_response_total` | `float` | second | - | +| `Succ` | `bool` | none | - | +| `Resource_group` | `string` | none | Case-insensitive when matched | +| `KV_total` | `float` | second | - | +| `PD_total` | `float` | second | - | +| `Unpacked_bytes_sent_tikv_total` | `int` | bytes | - | +| `Unpacked_bytes_received_tikv_total` | `int` | bytes | - | +| `Unpacked_bytes_sent_tikv_cross_zone` | `int` | bytes | - | +| `Unpacked_bytes_received_tikv_cross_zone` | `int` | bytes | - | +| `Unpacked_bytes_sent_tiflash_total` | `int` | bytes | - | +| `Unpacked_bytes_received_tiflash_total` | `int` | bytes | - | +| `Unpacked_bytes_sent_tiflash_cross_zone` | `int` | bytes | - | +| `Unpacked_bytes_received_tiflash_cross_zone` | `int` | bytes | - | +| `Process_time` | `float` | second | - | +| `Backoff_time` | `float` | second | - | +| `Total_keys` | `uint` | count | - | +| `Process_keys` | `uint` | count | - | +| `cop_mvcc_read_amplification` | `float` | ratio | Ratio value (`Total_keys / Process_keys`) | +| `Prewrite_time` | `float` | second | - | +| `Commit_time` | `float` | second | - | +| `Write_keys` | `uint` | count | - | +| `Write_size` | `uint` | bytes | - | +| `Prewrite_region` | `uint` | count | - | + +### Effective behavior and matching order + +- Rule update behavior: every execution of `SET [SESSION|GLOBAL] tidb_slow_log_rules = '...'` overwrites the existing rules in that scope instead of appending to them. +- Rule clearing behavior: `SET [SESSION|GLOBAL] tidb_slow_log_rules = ''` clears the rules in the corresponding scope. +- If the current session has any applicable `tidb_slow_log_rules`, such as `SESSION` rules, `GLOBAL` rules for the current `Conn_ID`, or generic global rules without `Conn_ID`, the output of slow query logs is determined by rule matching results, and `tidb_slow_log_threshold` is no longer used. +- If the current session has no applicable rules, for example when both `SESSION` and `GLOBAL` rules are empty, or only `GLOBAL` rules that do not match the current `Conn_ID` are configured, slow query logging still depends on `tidb_slow_log_threshold`. Note that the unit is milliseconds. +- If you still want to use SQL execution time as a condition for writing slow logs, use `Query_time` in the rule and note that the unit is seconds. +- Rule matching logic: + - Multiple rules are combined with `OR`, while multiple field conditions within a single rule are combined with `AND`. + - `SESSION`-scope rules are matched first. If none matches, TiDB then matches `GLOBAL` rules for the current `Conn_ID`, followed by generic `GLOBAL` rules without `Conn_ID`. +- `SHOW VARIABLES LIKE 'tidb_slow_log_rules'` and `SELECT @@SESSION.tidb_slow_log_rules` return the `SESSION` rule text, or an empty string if unset. `SELECT @@GLOBAL.tidb_slow_log_rules` returns the `GLOBAL` rule text. + +### Examples + +- Standard format (`SESSION` scope): + + ```sql + SET SESSION tidb_slow_log_rules = 'Query_time: 0.5, Is_internal: false'; + ``` + +- Invalid format (`SESSION` scope does not support `Conn_ID`): + + ```sql + SET SESSION tidb_slow_log_rules = 'Conn_ID: 12, Query_time: 0.5, Is_internal: false'; + ``` + +- Global rule (applies to all connections): + + ```sql + SET GLOBAL tidb_slow_log_rules = 'Query_time: 0.5, Is_internal: false'; + ``` + +- Global rules for specific connections (applied separately to the two connections `Conn_ID:11` and `Conn_ID:12`): + + ```sql + SET GLOBAL tidb_slow_log_rules = 'Conn_ID: 11, Query_time: 0.5, Is_internal: false; Conn_ID: 12, Query_time: 0.6, Process_time: 0.3, DB: db1'; + ``` + +### Recommendations + +- `tidb_slow_log_rules` is designed to replace the single-threshold approach. It supports combinations of multi-dimensional metric conditions, enabling more flexible and fine-grained control over slow query logging. + +- In a well-provisioned test environment with 1 TiDB node (16 CPU cores, 48 GiB memory) and 3 TiKV nodes (each with 16 CPU cores and 48 GiB memory), repeated sysbench tests show that performance impact remains small when multi-dimensional slow query log rules generate millions of slow log entries within 30 minutes. However, when the log volume reaches tens of millions, TPS drops significantly and latency increases noticeably. Therefore, if business workload is high or CPU and memory resources are close to their limits, configure `tidb_slow_log_rules` carefully to avoid log flooding caused by overly broad rules. If you need to limit the log output rate, use [`tidb_slow_log_max_per_sec`](/system-variables.md#tidb_slow_log_max_per_sec-new-in-v900) to throttle it and reduce the impact on business performance. + ## Related system variables -* [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold): sets the threshold for the slow query log. The SQL statement whose execution time exceeds this threshold is recorded in the slow query log. The default value is 300 ms. -* [`tidb_slow_log_rules`](/system-variables.md#tidb_slow_log_rules-new-in-v900): defines trigger rules for the slow query log. It supports combining multi-dimensional metrics to provide more flexible and fine-grained logging. Introduced in v9.0.0, it gradually replaces the single-threshold approach (`tidb_slow_log_threshold`). This variable supports using the following fields as filter conditions when slow query logs are output. For more information, see [Fields description](#fields-description). - * Supported filter fields: - * Basic slow query information: - * `Query_time`, `Parse_time`, `Compile_time`, `Optimize_time`, `Wait_TS`, `Rewrite_time` - * `Digest`, `Plan_digest`, `Is_internal`, `Succ` - * `Exec_retry_count`, `Backoff_time`, `Write_sql_response_total` - * Transaction-related fields: - * `Prewrite_time`, `Commit_time`, `Write_keys`, `Write_size`, `Prewrite_region` - * User-related fields for SQL execution: - * `Conn_ID`, `DB`, `Session_alias` - * TiKV Coprocessor task-related fields: - * `Process_time`, `Total_keys`, `Process_keys`, `Num_cop_tasks` - * Memory usage related fields: - * `Mem_max` - * Disk usage related fields: - * `Disk_max` - * Resource control related fields: - * `Resource_group` - * Network transmission related fields: - * `KV_total`, `PD_total` - * `Unpacked_bytes_sent_tikv_total`, `Unpacked_bytes_received_tikv_total` - * `Unpacked_bytes_sent_tikv_cross_zone`, `Unpacked_bytes_received_tikv_cross_zone` - * `Unpacked_bytes_sent_tiflash_total`, `Unpacked_bytes_received_tiflash_total` - * `Unpacked_bytes_sent_tiflash_cross_zone`, `Unpacked_bytes_received_tiflash_cross_zone` - * If `tidb_slow_log_rules` is not set: - * Slow query logging still relies on `tidb_slow_log_threshold`. The `query_time` threshold is taken from that variable for backward compatibility. - * If `tidb_slow_log_rules` is set: - * Configured rules take precedence and `tidb_slow_log_threshold` is ignored. - * If you want to use SQL execution time as a condition for slow log output, configure the rule using `query_time` with a defined threshold. - * Rule matching logic (multiple rules are evaluated using OR logic): - * SESSION scope rules are matched first; if a SESSION rule matches, the slow query is logged. - * GLOBAL scope rules are evaluated only when no SESSION scope rules match: - * If a GLOBAL rule specifies a `ConnID` that matches the current session `ConnID`, that rule is applied. - * If a GLOBAL rule does not specify a `ConnID` (acting as a general global rule), that rule is applied. - * The behavior of displaying this variable via `SHOW VARIABLES`, `SELECT @@GLOBAL.tidb_slow_log_rules`, or `SELECT @@SESSION.tidb_slow_log_rules` is consistent with other system variables. - * Examples: - * Standard format (SESSION scope): - - ```sql - SET SESSION tidb_slow_log_rules = 'Query_time: 500, Is_internal: false'; - ``` - - * Incorrect format (SESSION scope does not support `ConnID`): - - ```sql - SET SESSION tidb_slow_log_rules = 'ConnID: 12, Query_time: 500, Is_internal: false'; - ``` - - * Global rules (applying to all connections): - - ```sql - SET GLOBAL tidb_slow_log_rules = 'Query_time: 500, Is_internal: false'; - ``` - - * Global rules for specific connections (applying to connections with `ConnID: 11` and `ConnID: 12` respectively): - - ```sql - SET GLOBAL tidb_slow_log_rules = 'ConnID: 11, Query_time: 500, Is_internal: false; ConnID: 12, Query_time: 600, Process_time: 300, DB: db1'; - ``` +* [`tidb_slow_log_rules`](/system-variables.md#tidb_slow_log_rules-new-in-v900): see [`tidb_slow_log_rules` recommendations](#recommendations) + +* [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold): sets the threshold for slow query logging. SQL statements whose execution time exceeds this threshold are recorded in the slow query log. The default value is `300ms` (milliseconds). > **Tip:** > - > - `tidb_slow_log_rules` replaces the single-threshold method to enable more flexible and fine-grained slow query control using multi-dimensional rule combinations. - > - In a resource-sufficient test environment (1 TiDB node with 16 CPU cores and 48 GiB memory, and 3 TiKV nodes each with 16 CPU cores and 48 GiB memory), multiple sysbench tests show that when multi-dimensional slow query log rules generate millions of slow query log entries within 30 minutes, the impact on performance is minimal. However, when the log volume reaches tens of millions, TPS drops significantly and latency increases noticeably. In high-workload scenarios or when CPU and memory are close to their limits, you should configure `tidb_slow_log_rules` carefully to avoid log flooding caused by overly broad rules. It is recommended to use `tidb_slow_log_max_per_sec` to limit the log output rate and reduce the impact on application performance. + > Time-related fields in `tidb_slow_log_rules`, such as `Query_time` and `Process_time`, use seconds as the unit and can include decimals, while [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold) uses milliseconds. * [`tidb_slow_log_max_per_sec`](/system-variables.md#tidb_slow_log_max_per_sec-new-in-v900): sets the maximum number of slow query log entries that can be written per second. The default value is `0`. This variable is introduced in v9.0.0. * A value of `0` means there is no limit on the number of slow query log entries written per second. * A value greater than `0` means TiDB writes at most the specified number of slow query log entries per second. Any excess log entries are discarded and not written to the slow query log file. * It is recommended to set this variable after enabling `tidb_slow_log_rules` to prevent rule-based slow query logging from being triggered too frequently. + * [`tidb_query_log_max_len`](/system-variables.md#tidb_query_log_max_len): sets the maximum length of the SQL statement recorded in the slow query log. The default value is 4096 (byte). + * [`tidb_redact_log`](/system-variables.md#tidb_redact_log): controls whether user data in SQL statements recorded in the slow query log is redacted and replaced with `?`. The default value is `0`, which means this feature is disabled. + * [`tidb_enable_collect_execution_info`](/system-variables.md#tidb_enable_collect_execution_info): controls whether to record the physical execution information of each operator in the execution plan. The default value is `1`. This feature impacts the performance by approximately 3%. After enabling this feature, you can view the `Plan` information as follows: ```sql diff --git a/system-variables.md b/system-variables.md index 79af0db79a694..bde637f23108e 100644 --- a/system-variables.md +++ b/system-variables.md @@ -5889,10 +5889,12 @@ Query OK, 0 rows affected, 1 warning (0.00 sec) - Default value: "" - Type: String - This variable defines the triggering rules for slow query logs. It supports combining multi-dimensional metrics to provide more flexible and fine-grained logging. +- For more information about how to use this system variable, see [Use `tidb_slow_log_rules`](/identify-slow-queries.md#use-tidb_slow_log_rules). > **Tip:** > -> After enabling `tidb_slow_log_rules`, it is recommended to also configure [`tidb_slow_log_max_per_sec`](#tidb_slow_log_max_per_sec-new-in-v900) to limit the slow query log output rate and prevent rule-based slow query logging from being triggered too frequently. +> - When enabling `tidb_slow_log_rules` in a production environment, it is recommended to also configure [`tidb_slow_log_max_per_sec`](#tidb_slow_log_max_per_sec-new-in-v900) to avoid excessively frequent slow query log printing. +> - It is recommended to start with stricter conditions and gradually relax them based on troubleshooting needs. For more information on performance impact, see [Recommendations](/identify-slow-queries.md#recommendations). ### tidb_slow_log_threshold