api,metrics: add changefeed operation history (#5095)#5105
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
This cherry pick PR is for a release branch and has not yet been approved by triage owners. To merge this cherry pick:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@wlwilliamx This PR has conflicts, I have hold it. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ChangefeedOperationMiddleware to track user-initiated changefeed mutations for logging and Grafana dashboards, including a bounded in-memory store and new Prometheus metrics. However, the PR contains critical issues: several files, including metric definitions and Grafana JSON configurations, contain unresolved git conflict markers that will cause compilation and parsing errors. Additionally, the middleware references an undefined constant ClientVersionHeader, and it is recommended to use seconds instead of milliseconds for Prometheus gauge values to follow industry best practices.
| <<<<<<< HEAD | ||
| ======= | ||
| // ChangefeedErrorInfoGauge records the current warning or failed reason and its occurrence time | ||
| // for each changefeed. | ||
| ChangefeedErrorInfoGauge = prometheus.NewGaugeVec( | ||
| prometheus.GaugeOpts{ | ||
| Namespace: "ticdc", | ||
| Subsystem: "owner", | ||
| Name: "changefeed_error_info", | ||
| Help: "The current warning or failed reason and occurrence time of changefeeds", | ||
| }, []string{getKeyspaceLabel(), "changefeed", "state", "error_time", "code", "message"}) | ||
|
|
||
| // ChangefeedOperationTimeGauge records a bounded set of recent user initiated | ||
| // changefeed operation timestamps for the Grafana investigation panel. | ||
| ChangefeedOperationTimeGauge = prometheus.NewGaugeVec( | ||
| prometheus.GaugeOpts{ | ||
| Namespace: "ticdc", | ||
| Subsystem: "owner", | ||
| Name: "changefeed_operation_time", | ||
| Help: "Recent user initiated changefeed operation timestamps in Unix milliseconds", | ||
| }, []string{getKeyspaceLabel(), "changefeed", "operation", "result", "username", "details", "error", "event_id"}) | ||
|
|
||
| >>>>>>> 3a652c164 (api,metrics: add changefeed operation history (#5095)) |
There was a problem hiding this comment.
This file contains git conflict markers (<<<<<<< HEAD, =======, >>>>>>>). This will cause a compilation error. It appears the cherry-pick was not resolved correctly before committing.
// ChangefeedErrorInfoGauge records the current warning or failed reason and its occurrence time
// for each changefeed.
ChangefeedErrorInfoGauge = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "ticdc",
Subsystem: "owner",
Name: "changefeed_error_info",
Help: "The current warning or failed reason and occurrence time of changefeeds",
}, []string{getKeyspaceLabel(), "changefeed", "state", "error_time", "code", "message"})
// ChangefeedOperationTimeGauge records a bounded set of recent user initiated
// changefeed operation timestamps for the Grafana investigation panel.
ChangefeedOperationTimeGauge = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "ticdc",
Subsystem: "owner",
Name: "changefeed_operation_time",
Help: "Recent user initiated changefeed operation timestamps in Unix milliseconds",
}, []string{getKeyspaceLabel(), "changefeed", "operation", "result", "username", "details", "error", "event_id"})| <<<<<<< HEAD | ||
| ======= | ||
| registry.MustRegister(ChangefeedErrorInfoGauge) | ||
| registry.MustRegister(ChangefeedOperationTimeGauge) | ||
| >>>>>>> 3a652c164 (api,metrics: add changefeed operation history (#5095)) |
| <<<<<<< HEAD | ||
| ======= | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Current warning or failed reason of each changefeed. The metric message is normalized to a single line and truncated to 256 characters.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "namespace" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "state" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "code" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "error_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 8, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 26 | ||
| }, | ||
| "id": 62010, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (namespace, changefeed, state, code, error_time, message) (ticdc_owner_changefeed_error_info{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Error Details", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "Value": true, | ||
| "__name__": true | ||
| }, | ||
| "indexByName": { | ||
| "namespace": 0, | ||
| "changefeed": 1, | ||
| "state": 2, | ||
| "error_time": 3, | ||
| "code": 4, | ||
| "message": 5 | ||
| }, | ||
| "renameByName": {} | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Recent user initiated changefeed mutations retained in memory on the coordinator for oncall investigation. Use TiCDC logs for durable history beyond the latest 100 operations.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "namespace" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| }, | ||
| { | ||
| "id": "unit", | ||
| "value": "dateTimeAsIso" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "result" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 90 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "username" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "details" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 320 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 9, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 34 | ||
| }, | ||
| "id": 62042, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [ | ||
| { | ||
| "displayName": "operation_time", | ||
| "desc": true | ||
| } | ||
| ] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (namespace, changefeed, operation, result, username, details, error, event_id) (ticdc_owner_changefeed_operation_time{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", namespace=~\"$namespace\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Operation History", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "__name__": true, | ||
| "event_id": true | ||
| }, | ||
| "indexByName": { | ||
| "namespace": 0, | ||
| "changefeed": 1, | ||
| "Value": 2, | ||
| "operation": 3, | ||
| "result": 4, | ||
| "username": 5, | ||
| "details": 6, | ||
| "error": 7 | ||
| }, | ||
| "renameByName": { | ||
| "Value": "operation_time" | ||
| } | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| >>>>>>> 3a652c164 (api,metrics: add changefeed operation history (#5095)) |
| <<<<<<< HEAD | ||
| ======= | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Current warning or failed reason of each changefeed. The metric message is normalized to a single line and truncated to 256 characters.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "keyspace_name" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "state" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "code" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "error_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 8, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 26 | ||
| }, | ||
| "id": 62010, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (keyspace_name, changefeed, state, code, error_time, message) (ticdc_owner_changefeed_error_info{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Error Details", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "Value": true, | ||
| "__name__": true | ||
| }, | ||
| "indexByName": { | ||
| "keyspace_name": 0, | ||
| "changefeed": 1, | ||
| "state": 2, | ||
| "error_time": 3, | ||
| "code": 4, | ||
| "message": 5 | ||
| }, | ||
| "renameByName": {} | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Recent user initiated changefeed mutations retained in memory on the coordinator for oncall investigation. Use TiCDC logs for durable history beyond the latest 100 operations.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "keyspace_name" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| }, | ||
| { | ||
| "id": "unit", | ||
| "value": "dateTimeAsIso" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "result" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 90 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "username" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "details" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 320 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 9, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 34 | ||
| }, | ||
| "id": 62042, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [ | ||
| { | ||
| "displayName": "operation_time", | ||
| "desc": true | ||
| } | ||
| ] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (keyspace_name, changefeed, operation, result, username, details, error, event_id) (ticdc_owner_changefeed_operation_time{k8s_cluster=\"$k8s_cluster\", sharedpool_id=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Operation History", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "__name__": true, | ||
| "event_id": true | ||
| }, | ||
| "indexByName": { | ||
| "keyspace_name": 0, | ||
| "changefeed": 1, | ||
| "Value": 2, | ||
| "operation": 3, | ||
| "result": 4, | ||
| "username": 5, | ||
| "details": 6, | ||
| "error": 7 | ||
| }, | ||
| "renameByName": { | ||
| "Value": "operation_time" | ||
| } | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| >>>>>>> 3a652c164 (api,metrics: add changefeed operation history (#5095)) |
| <<<<<<< HEAD | ||
| ======= | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Current warning or failed reason of each changefeed. The metric message is normalized to a single line and truncated to 256 characters.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "keyspace_name" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "state" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "code" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "error_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 8, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 26 | ||
| }, | ||
| "id": 62010, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (keyspace_name, changefeed, state, code, error_time, message) (ticdc_owner_changefeed_error_info{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Error Details", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "Value": true, | ||
| "__name__": true | ||
| }, | ||
| "indexByName": { | ||
| "keyspace_name": 0, | ||
| "changefeed": 1, | ||
| "state": 2, | ||
| "error_time": 3, | ||
| "code": 4, | ||
| "message": 5 | ||
| }, | ||
| "renameByName": {} | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| }, | ||
| { | ||
| "datasource": "${DS_TEST-CLUSTER}", | ||
| "description": "Recent user initiated changefeed mutations retained in memory on the coordinator for oncall investigation. Use TiCDC logs for durable history beyond the latest 100 operations.", | ||
| "fieldConfig": { | ||
| "defaults": { | ||
| "custom": { | ||
| "align": null, | ||
| "filterable": false | ||
| }, | ||
| "links": [], | ||
| "mappings": [], | ||
| "thresholds": { | ||
| "mode": "absolute", | ||
| "steps": [ | ||
| { | ||
| "color": "green", | ||
| "value": null | ||
| }, | ||
| { | ||
| "color": "red", | ||
| "value": 80 | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "overrides": [ | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "keyspace_name" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "changefeed" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation_time" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 180 | ||
| }, | ||
| { | ||
| "id": "unit", | ||
| "value": "dateTimeAsIso" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "operation" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 100 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "result" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 90 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "username" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 120 | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "matcher": { | ||
| "id": "byName", | ||
| "options": "details" | ||
| }, | ||
| "properties": [ | ||
| { | ||
| "id": "custom.width", | ||
| "value": 320 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| }, | ||
| "gridPos": { | ||
| "h": 9, | ||
| "w": 24, | ||
| "x": 0, | ||
| "y": 34 | ||
| }, | ||
| "id": 62042, | ||
| "options": { | ||
| "showHeader": true, | ||
| "sortBy": [ | ||
| { | ||
| "displayName": "operation_time", | ||
| "desc": true | ||
| } | ||
| ] | ||
| }, | ||
| "pluginVersion": "7.5.17", | ||
| "targets": [ | ||
| { | ||
| "expr": "max by (keyspace_name, changefeed, operation, result, username, details, error, event_id) (ticdc_owner_changefeed_operation_time{k8s_cluster=\"$k8s_cluster\", tidb_cluster=\"$tidb_cluster\", keyspace_name=~\"$keyspace_name\", changefeed=~\"$changefeed\"})", | ||
| "format": "time_series", | ||
| "instant": true, | ||
| "refId": "A" | ||
| } | ||
| ], | ||
| "title": "Changefeed Operation History", | ||
| "transformations": [ | ||
| { | ||
| "id": "labelsToFields", | ||
| "options": {} | ||
| }, | ||
| { | ||
| "id": "organize", | ||
| "options": { | ||
| "excludeByName": { | ||
| "Metric": true, | ||
| "Time": true, | ||
| "__name__": true, | ||
| "event_id": true | ||
| }, | ||
| "indexByName": { | ||
| "keyspace_name": 0, | ||
| "changefeed": 1, | ||
| "Value": 2, | ||
| "operation": 3, | ||
| "result": 4, | ||
| "username": 5, | ||
| "details": 6, | ||
| "error": 7 | ||
| }, | ||
| "renameByName": { | ||
| "Value": "operation_time" | ||
| } | ||
| } | ||
| } | ||
| ], | ||
| "type": "table" | ||
| >>>>>>> 3a652c164 (api,metrics: add changefeed operation history (#5095)) |
| zap.String("username", username), | ||
| zap.String("ip", c.ClientIP()), | ||
| zap.String("userAgent", c.Request.UserAgent()), | ||
| zap.String("clientVersion", c.Request.Header.Get(ClientVersionHeader)), |
There was a problem hiding this comment.
The constant ClientVersionHeader is used but not defined in this file or imported. If it is defined in pkg/api, it should be referenced as api.ClientVersionHeader.
| zap.String("clientVersion", c.Request.Header.Get(ClientVersionHeader)), | |
| zap.String("clientVersion", c.Request.Header.Get(api.ClientVersionHeader)), |
| // The dashboard only needs a recent investigation window. Keep this cache | ||
| // bounded so user names and detail strings do not become unbounded metric | ||
| // cardinality over long-running clusters. | ||
| metrics.ChangefeedOperationTimeGauge.WithLabelValues(labels.labelValues()...).Set(float64(operationTime.UnixMilli())) |
There was a problem hiding this comment.
Prometheus best practices generally recommend using seconds for timestamps in gauges. While the dashboard is configured for dateTimeAsIso, Grafana usually expects the value to be in seconds for this unit. Using milliseconds might result in incorrect date displays in some Grafana versions or configurations unless explicitly handled.
| metrics.ChangefeedOperationTimeGauge.WithLabelValues(labels.labelValues()...).Set(float64(operationTime.UnixMilli())) | |
| metrics.ChangefeedOperationTimeGauge.WithLabelValues(labels.labelValues()...).Set(float64(operationTime.Unix())) |
This is an automated cherry-pick of #5095
What problem does this PR solve?
Issue Number: close #5087
What is changed and how it works?
Changefeed Operation Historytable panel with operation time, result, username, non-sensitive details, and error summary.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No compatibility change. The dashboard-facing metric cache is bounded to the latest 100 operations to avoid unbounded cardinality growth.
Do you need to update user documentation, design documentation or monitoring documentation?
The Grafana dashboard is updated in this PR. No separate user or design documentation change is required.
Release note
Summary by CodeRabbit
Release Notes