[Bug]: DROP ACCOUNT can fail with commitUnsafe after statement_cu lock escalation

### Is there an existing issue for the same bug?

- [x] I have checked the existing issues.

### Branch Name

3.0-dev

### Commit ID

e6d0a67d96

### Other Environment Information

```Markdown
- Hardware parameters: cloud QA deployment
- OS type: issue reproduced from a mysql client on macOS; server-side logs came from a MatrixOne QA cluster
- Others: internal Grafana/Loki logs are available and were used for the analysis below
```

### Actual Behavior

`DROP ACCOUNT` can hang for a long time and finally fail with `commitUnsafe`.

Observed SQL:

```sql
select now();
drop account `79307612_ae09_4fc0_81c3_e3f8b13a9927`;
select now();
```

Observed timestamps / error:

```text
2026-04-01 11:25:19.104741

ERROR 1105 (HY000): context deadline exceeded
internal error: commitUnsafe

2026-04-01 11:40:21.038359
```

So the statement spent about 15 minutes before failing.

### Expected Behavior

`DROP ACCOUNT` should complete successfully or fail quickly with a direct and actionable error. Cleaning account-owned internal metric rows should not make the whole account-drop transaction hang and then time out at commit.

### Steps to Reproduce

```Markdown
1. Prepare an account that has accumulated a noticeable amount of statement CU metric data and is still subject to concurrent metric writes.
2. From the sys tenant, execute:

   select now();
   drop account `{account_name}`;
   select now();

3. Observe that the statement can block for minutes and then fail with:

   ERROR 1105 (HY000): context deadline exceeded
   internal error: commitUnsafe
```

### Additional information

#### Key evidence from logs

1. The second failing attempt started at `2026-04-01 11:25:19` and the final visible `dropAccount ... sql:` entries reached:

   ```text
   delete from mo_catalog.`mo_data_key` where account_id = 255000001;
   delete from mo_catalog.`statement_cu` where account_id = 255000001;
   ```

2. The same background transaction then logged:

   ```text
   2026-04-01 11:26:44.895452 +0000 lockservice/lock_table_local.go:111
   failed to lock on local
   table: 272566
   row count: 4111
   opts: Exclusive-Row-Wait
   error: row level lock is too large that need upgrade to table level lock
   ```

3. Earlier in the same transaction, `disttae/txn_table.go:743` had already identified table `272566` as `statement_cu`.

4. The same background transaction later logged:

   ```text
   2026-04-01 11:33:51.786435 +0000 frontend/back_status_stmt.go:41
   time of Exec.Run : 8m30.771138389s
   ```

5. It finally ended with commit-phase timeout logs:

   ```text
   2026-04-01 11:40:20.975546 +0000 client/operator.go:1204
   txn send requests failed
   error: context deadline exceeded

   2026-04-01 11:40:20.975622 +0000 client/operator.go:1428
   txn wait committed log applied failed in rc mode
   error: context deadline exceeded
   ```

#### Code path / root cause analysis

- `pkg/frontend/authenticate.go:3787-3791` opens a background transaction for `doDropAccount()` and only commits it at `finishTxn()`.
- `pkg/frontend/authenticate.go:3999-4007` iterates cluster tables and executes:

  ```sql
  delete from mo_catalog.`{cluster_table}` where account_id = {account_id};
  ```

- In this reproduction, one of those cluster tables is `statement_cu`, and the failing lock log shows the delete tried to lock 4111 rows and had to upgrade from row-level locking to a table-level lock, but that lock was not acquired.
- `sql_statement_cu` is a continuously written metric table (`pkg/util/metric/mometric/metric_collector.go:316-346`) and is marked with `[mo_no_del_hint]` in schema metadata (`pkg/bootstrap/versions/v2_0_0/cluster_upgrade_list.go:264-274`), so it is a particularly hot place to do synchronous account cleanup.
- `pkg/frontend/txn.go:503-506` commits with `CommitOrRollbackTimeout`, and `pkg/txn/client/operator.go:1414-1428` waits for committed logtail apply in RC mode. After the long lock wait stretched the background transaction, the final commit hit that timeout and surfaced as `context deadline exceeded / commitUnsafe`.

#### Conclusion

The root cause is that `DROP ACCOUNT` performs hot cluster-table cleanup (`statement_cu`) inside the same large background transaction as the rest of account deletion. The delete on `statement_cu` hit lock escalation pressure, needed a table-level lock, could not acquire it under concurrent activity, and stretched the transaction long enough that the final commit timed out.

#### Possible fix directions

1. Special-case `statement_cu` / similar hot metric tables so they are not cleaned synchronously inside the critical `DROP ACCOUNT` transaction.
2. If synchronous cleanup is required, acquire an appropriate table-level lock up front or chunk the delete more carefully.
3. Reduce the scope of the single `DROP ACCOUNT` transaction so hot cluster-table cleanup does not hold the entire account-drop path hostage.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: DROP ACCOUNT can fail with commitUnsafe after statement_cu lock escalation #24073

Is there an existing issue for the same bug?

Branch Name

Commit ID

Other Environment Information

Actual Behavior

Expected Behavior

Steps to Reproduce

Additional information

Key evidence from logs

Code path / root cause analysis

Conclusion

Possible fix directions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: DROP ACCOUNT can fail with commitUnsafe after statement_cu lock escalation #24073

Description

Is there an existing issue for the same bug?

Branch Name

Commit ID

Other Environment Information

Actual Behavior

Expected Behavior

Steps to Reproduce

Additional information

Key evidence from logs

Code path / root cause analysis

Conclusion

Possible fix directions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions