Is there an existing issue for the same bug?
Branch Name
3.0-dev
Commit ID
e6d0a67
Other Environment Information
- Hardware parameters: cloud QA deployment
- OS type: issue reproduced from a mysql client on macOS; server-side logs came from a MatrixOne QA cluster
- Others: internal Grafana/Loki logs are available and were used for the analysis below
Actual Behavior
DROP ACCOUNT can hang for a long time and finally fail with commitUnsafe.
Observed SQL:
select now();
drop account `79307612_ae09_4fc0_81c3_e3f8b13a9927`;
select now();
Observed timestamps / error:
2026-04-01 11:25:19.104741
ERROR 1105 (HY000): context deadline exceeded
internal error: commitUnsafe
2026-04-01 11:40:21.038359
So the statement spent about 15 minutes before failing.
Expected Behavior
DROP ACCOUNT should complete successfully or fail quickly with a direct and actionable error. Cleaning account-owned internal metric rows should not make the whole account-drop transaction hang and then time out at commit.
Steps to Reproduce
1. Prepare an account that has accumulated a noticeable amount of statement CU metric data and is still subject to concurrent metric writes.
2. From the sys tenant, execute:
select now();
drop account `{account_name}`;
select now();
3. Observe that the statement can block for minutes and then fail with:
ERROR 1105 (HY000): context deadline exceeded
internal error: commitUnsafe
Additional information
Key evidence from logs
-
The second failing attempt started at 2026-04-01 11:25:19 and the final visible dropAccount ... sql: entries reached:
delete from mo_catalog.`mo_data_key` where account_id = 255000001;
delete from mo_catalog.`statement_cu` where account_id = 255000001;
-
The same background transaction then logged:
2026-04-01 11:26:44.895452 +0000 lockservice/lock_table_local.go:111
failed to lock on local
table: 272566
row count: 4111
opts: Exclusive-Row-Wait
error: row level lock is too large that need upgrade to table level lock
-
Earlier in the same transaction, disttae/txn_table.go:743 had already identified table 272566 as statement_cu.
-
The same background transaction later logged:
2026-04-01 11:33:51.786435 +0000 frontend/back_status_stmt.go:41
time of Exec.Run : 8m30.771138389s
-
It finally ended with commit-phase timeout logs:
2026-04-01 11:40:20.975546 +0000 client/operator.go:1204
txn send requests failed
error: context deadline exceeded
2026-04-01 11:40:20.975622 +0000 client/operator.go:1428
txn wait committed log applied failed in rc mode
error: context deadline exceeded
Code path / root cause analysis
-
pkg/frontend/authenticate.go:3787-3791 opens a background transaction for doDropAccount() and only commits it at finishTxn().
-
pkg/frontend/authenticate.go:3999-4007 iterates cluster tables and executes:
delete from mo_catalog.`{cluster_table}` where account_id = {account_id};
-
In this reproduction, one of those cluster tables is statement_cu, and the failing lock log shows the delete tried to lock 4111 rows and had to upgrade from row-level locking to a table-level lock, but that lock was not acquired.
-
sql_statement_cu is a continuously written metric table (pkg/util/metric/mometric/metric_collector.go:316-346) and is marked with [mo_no_del_hint] in schema metadata (pkg/bootstrap/versions/v2_0_0/cluster_upgrade_list.go:264-274), so it is a particularly hot place to do synchronous account cleanup.
-
pkg/frontend/txn.go:503-506 commits with CommitOrRollbackTimeout, and pkg/txn/client/operator.go:1414-1428 waits for committed logtail apply in RC mode. After the long lock wait stretched the background transaction, the final commit hit that timeout and surfaced as context deadline exceeded / commitUnsafe.
Conclusion
The root cause is that DROP ACCOUNT performs hot cluster-table cleanup (statement_cu) inside the same large background transaction as the rest of account deletion. The delete on statement_cu hit lock escalation pressure, needed a table-level lock, could not acquire it under concurrent activity, and stretched the transaction long enough that the final commit timed out.
Possible fix directions
- Special-case
statement_cu / similar hot metric tables so they are not cleaned synchronously inside the critical DROP ACCOUNT transaction.
- If synchronous cleanup is required, acquire an appropriate table-level lock up front or chunk the delete more carefully.
- Reduce the scope of the single
DROP ACCOUNT transaction so hot cluster-table cleanup does not hold the entire account-drop path hostage.
Is there an existing issue for the same bug?
Branch Name
3.0-dev
Commit ID
e6d0a67
Other Environment Information
Actual Behavior
DROP ACCOUNTcan hang for a long time and finally fail withcommitUnsafe.Observed SQL:
Observed timestamps / error:
So the statement spent about 15 minutes before failing.
Expected Behavior
DROP ACCOUNTshould complete successfully or fail quickly with a direct and actionable error. Cleaning account-owned internal metric rows should not make the whole account-drop transaction hang and then time out at commit.Steps to Reproduce
Additional information
Key evidence from logs
The second failing attempt started at
2026-04-01 11:25:19and the final visibledropAccount ... sql:entries reached:The same background transaction then logged:
Earlier in the same transaction,
disttae/txn_table.go:743had already identified table272566asstatement_cu.The same background transaction later logged:
It finally ended with commit-phase timeout logs:
Code path / root cause analysis
pkg/frontend/authenticate.go:3787-3791opens a background transaction fordoDropAccount()and only commits it atfinishTxn().pkg/frontend/authenticate.go:3999-4007iterates cluster tables and executes:In this reproduction, one of those cluster tables is
statement_cu, and the failing lock log shows the delete tried to lock 4111 rows and had to upgrade from row-level locking to a table-level lock, but that lock was not acquired.sql_statement_cuis a continuously written metric table (pkg/util/metric/mometric/metric_collector.go:316-346) and is marked with[mo_no_del_hint]in schema metadata (pkg/bootstrap/versions/v2_0_0/cluster_upgrade_list.go:264-274), so it is a particularly hot place to do synchronous account cleanup.pkg/frontend/txn.go:503-506commits withCommitOrRollbackTimeout, andpkg/txn/client/operator.go:1414-1428waits for committed logtail apply in RC mode. After the long lock wait stretched the background transaction, the final commit hit that timeout and surfaced ascontext deadline exceeded / commitUnsafe.Conclusion
The root cause is that
DROP ACCOUNTperforms hot cluster-table cleanup (statement_cu) inside the same large background transaction as the rest of account deletion. The delete onstatement_cuhit lock escalation pressure, needed a table-level lock, could not acquire it under concurrent activity, and stretched the transaction long enough that the final commit timed out.Possible fix directions
statement_cu/ similar hot metric tables so they are not cleaned synchronously inside the criticalDROP ACCOUNTtransaction.DROP ACCOUNTtransaction so hot cluster-table cleanup does not hold the entire account-drop path hostage.