Skip to content

fix: check lock table binds during load#24898

Merged
mergify[bot] merged 5 commits into
matrixorigin:mainfrom
aptend:check-lock-table-binds-load
Jun 16, 2026
Merged

fix: check lock table binds during load#24898
mergify[bot] merged 5 commits into
matrixorigin:mainfrom
aptend:check-lock-table-binds-load

Conversation

@aptend

@aptend aptend commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #24897

What this PR does / why we need it:

A long-running LOAD DATA can hold stale lock table bind information while data is being read and written. If the lock-table owner CN shuts down during that window, the transaction may only discover the bind change at commit time and fail after a long run.

This change adds a LOAD-time lock table bind check before external batches are read. The check is throttled at the txn operator level, so concurrent LOAD readers in the same transaction share the same interval. It reuses the existing GetLockTableBind path instead of adding a new lockservice/proto RPC. Once a transaction detects a bind change, subsequent checks in the same transaction keep returning ErrLockTableBindChanged.

Tests cover changed bind detection, sticky bind-change failure, and reset of the txn-level throttle state when a txn operator is reused.

@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 9, 2026
@mergify mergify Bot added the kind/bug Something isn't working label Jun 9, 2026
@aptend aptend force-pushed the check-lock-table-binds-load branch from b502ee7 to ff15822 Compare June 9, 2026 09:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a failure mode where long-running LOAD DATA operations may hold stale lock-table bind metadata and only discover a bind change at commit time (after significant work). It introduces a transaction-operator-level bind check (throttled) and invokes it during LOAD execution so bind changes are detected earlier and fail fast with ErrLockTableBindChanged.

Changes:

  • Add TxnOperator.CheckLockTableBinds(ctx) and implement throttled, “sticky” bind-change detection in txnOperator.
  • Add LockService.GetLatestLockTableBind(bind) to fetch allocator-latest bind and use it for bind validation.
  • Invoke bind checks in LOAD execution paths and add tests covering bind-change detection, stickiness, and throttle reset on operator reuse.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/vm/engine/test/testutil/disttae_engine.go Extend mock lockservice to satisfy new GetLatestLockTableBind API.
pkg/vm/engine/entire_engine_test.go Extend test operator to satisfy new CheckLockTableBinds API.
pkg/txn/storage/memorystorage/storage_txn_client.go Extend storage txn operator to satisfy new CheckLockTableBinds API.
pkg/txn/client/types.go Extend TxnOperator interface with CheckLockTableBinds.
pkg/txn/client/operator.go Implement throttled bind-check logic and sticky bind-changed behavior in txnOperator.
pkg/txn/client/operator_test.go Add unit tests for bind-change detection, sticky failure, and init/reset of throttling state.
pkg/sql/compile/compile_test.go Update txn operator mocks to expect CheckLockTableBinds.
pkg/sql/colexec/external/external.go Call txn bind check before reading external batches for LOAD.
pkg/lockservice/types.go Extend lockservice interface with GetLatestLockTableBind.
pkg/lockservice/service_observability.go Implement allocator-latest bind fetch via existing GetBind path.
pkg/incrservice/store_sql_test.go Extend test lockservice/txn operator stubs to satisfy new interfaces.
pkg/frontend/txn_test.go Extend frontend test txn operator to satisfy new CheckLockTableBinds API.
pkg/frontend/test/txn_mock.go Regenerate/update mock txn operator with CheckLockTableBinds.
pkg/frontend/test/mock_lock/types.go Regenerate/update mock lockservice with GetLatestLockTableBind.
pkg/frontend/mysql_cmd_executor.go Add bind checks around LOAD LOCAL read/write loop.
pkg/cdc/table_change_stream_test.go Extend noop txn operator to satisfy new CheckLockTableBinds API.
pkg/bootstrap/service_test.go Extend bootstrap test txn operator to satisfy new CheckLockTableBinds API.
Files not reviewed (2)
  • pkg/frontend/test/mock_lock/types.go: Language not supported
  • pkg/frontend/test/txn_mock.go: Language not supported

Comment thread pkg/txn/client/operator.go
Comment thread pkg/frontend/mysql_cmd_executor.go

@aunjgr aunjgr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix. Throttled at 3-min, sticky on first detection, uses allocator-latest bind (bypasses local stale cache). The lockTables snapshot outside the mutex is safe since lock tables are append-only during a txn lifetime. Tests cover changed bind, stale local cache, sticky failure, and throttle reset on reuse.

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not confirm a blocking logic bug in the implementation itself, but I still think the unhappy-path coverage is not strong enough for this change.

The new bind-check behavior is added to multiple critical LOAD paths, but several of the newly introduced failure/guard branches are still effectively untested:

  1. processLoadLocal now calls CheckLockTableBinds() before the first packet read and again inside the main loop, but the current load-local tests build ExecCtx with proc == nil, so both new call sites are dead in test coverage. We still need coverage for bind-change failure before the first read and after at least one loop iteration.

  2. txnOperator.CheckLockTableBinds() adds important unhappy-path state handling that is currently not pinned: RPC error should clear the throttle timestamp so later calls retry; cancelled context should also clear the throttle; a closed txn should fail before any bind lookup; and a clean second call inside the 3-minute window should no-op instead of re-querying. These branches are now part of the core contract for the new throttled check.

  3. External.Call() now propagates bind-check failures for LOAD, but there is still no direct test that a CheckLockTableBinds() error is surfaced from the external LOAD path.

Since this PR is specifically about failing early on stale bind changes during long-running LOAD, I think these negative paths should be covered before approval.

@XuPeng-SH XuPeng-SH left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked the latest revision and I do not see a blocking issue now.

The earlier unhappy-path coverage gaps are addressed in the current head: processLoadLocal now has bind-check failure coverage both before the first read and after entering the loop; txnOperator.CheckLockTableBinds now has direct tests for RPC error, canceled context, closed txn, sticky failure, throttle reset, and clean throttled re-check behavior; and the external LOAD path now has a direct bind-check error propagation test.

LGTM.

@LeftHandCold LeftHandCold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I reviewed the latest head around the new LOAD-time bind checks, the txn-level throttled/sticky CheckLockTableBinds implementation, and the added negative-path tests. The previous coverage gaps are addressed, and the PR checks are green.

@mergify

mergify Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

  • Entered queue2026-06-15 08:32 UTC · Rule: main
  • Checks failed · in-place
  • 🚫 Left the queue2026-06-15 09:22 UTC · at 2fc3660970d8bfaf02044591f89cdd0207fb30a7

This pull request spent 49 minutes 48 seconds in the queue, with no time running CI.

Waiting for
  • any of: [🛡 GitHub branch protection]
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of: [🛡 GitHub branch protection]
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage
    • check-success = Matrixone Utils CI / Coverage
All conditions
  • any of [🛡 GitHub branch protection]:
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage
    • check-success = Matrixone Utils CI / Coverage
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)

Reason

The merge conditions cannot be satisfied due to failing checks

Failing checks:

Hint

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

@mergify

mergify Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

  • Entered queue2026-06-16 04:05 UTC · Rule: main
  • Checks skipped · PR is already up-to-date
  • Merged2026-06-16 04:05 UTC · at 4bb38c526a0087037065ec77bb5356f467db1189 · squash

This pull request spent 23 seconds in the queue, including 2 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • github-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/L Denotes a PR that changes [500,999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants