fix(proxy): clear transfer latch after unsafe attempt by aptend · Pull Request #24979 · matrixorigin/matrixone

aptend · 2026-06-15T03:02:17Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

When a proxy scaling transfer cannot start safely, for example because the session is still in a transaction, the async transfer attempt returned before finishTransfer was registered. Deliver had already marked the tunnel as inTransfer, so later drain polls skipped the tunnel forever and the CN drain could remain blocked until the connection closed.

This PR clears the transfer latch on that unsafe early-return path while preserving transferIntent, so later retries or intent-driven transfer can still happen. It also makes transfer intent metric updates idempotent and has sync transfer acquire the same transfer latch to avoid overlapping async/sync migration attempts.

Tests cover:

unsafe async transfer clears inTransfer and can be re-enqueued;
sync transfer skips while another transfer attempt owns the latch;
sync cannot-start releases the latch;
repeated transfer intent true/false updates do not skew the gauge.

qodo-code-review · 2026-06-15T03:02:22Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copilot

Pull request overview

This PR fixes a proxy tunnel transfer deadlock where Deliver() could mark a tunnel as inTransfer, but an unsafe async transfer attempt could return early before finishTransfer() cleared the latch—causing subsequent drain polls to skip the tunnel indefinitely and blocking CN draining. It also makes transfer-intent metric updates idempotent and ensures sync transfers respect the same inTransfer latch to prevent overlapping migration attempts.

Changes:

Make setTransferIntent idempotent via atomic swap to avoid double-inc/dec of the transfer-intent gauge.
Add explicit helpers to acquire/release the inTransfer latch for transfer attempts, and ensure unsafe async early-return clears the latch.
Ensure transferSync also acquires/releases the latch and add tests covering unsafe async retryability, sync latch behavior, and metric idempotency.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
pkg/proxy/tunnel.go	Clears `inTransfer` on unsafe async early-return, adds sync transfer latch acquisition, and makes transfer-intent gauge updates idempotent.
pkg/proxy/tunnel_test.go	Adds tests for retry behavior after unsafe transfer, sync latch behavior, and idempotent gauge updates.

XuPeng-SH

I found one substantive unhappy-path issue.

In pkg/proxy/tunnel.go, the unsafe async transfer path now calls finishTransferAttempt() before setTransferIntent(true). That opens a gap where another Deliver(..., transferByRebalance) can reacquire the latch and overwrite transferType to rebalance. On passive-policy proxies, setTransferIntent(true) then becomes a no-op, so the original scaling attempt loses the intent that should drive the later sync retry.

This means the stuck-drain fix can still be bypassed under scaling+rebalance interleaving. Please preserve/publish the scaling intent while the attempt still owns the latch, or make the intent update use the attempted transfer type instead of rereading mutable t.transferType.

fix(proxy): clear transfer latch after unsafe attempt

325d2a3

Copilot AI review requested due to automatic review settings June 15, 2026 03:02

aptend requested review from gouhongshen and volgariver6 as code owners June 15, 2026 03:02

aptend had a problem deploying to ci June 15, 2026 03:02 — with GitHub Actions Error

aptend temporarily deployed to ci June 15, 2026 03:02 — with GitHub Actions Inactive

aptend had a problem deploying to ci June 15, 2026 03:02 — with GitHub Actions Error

aptend temporarily deployed to ci June 15, 2026 03:02 — with GitHub Actions Inactive

aptend had a problem deploying to ci June 15, 2026 03:02 — with GitHub Actions Error

matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 15, 2026

Copilot started reviewing on behalf of aptend June 15, 2026 03:02 View session

Merge branch 'main' into fix/proxy-transfer-intent-stuck

a177cef

mergify Bot added the kind/bug Something isn't working label Jun 15, 2026

mergify Bot had a problem deploying to ci June 15, 2026 03:03 Error

mergify Bot temporarily deployed to ci June 15, 2026 03:03 Inactive

mergify Bot temporarily deployed to ci June 15, 2026 03:04 Inactive

mergify Bot had a problem deploying to ci June 15, 2026 03:04 Error

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread pkg/proxy/tunnel.go

fix(proxy): publish transfer intent after releasing latch

c9829c8

aptend temporarily deployed to ci June 15, 2026 03:12 — with GitHub Actions Inactive

aptend had a problem deploying to ci June 15, 2026 03:12 — with GitHub Actions Failure

XuPeng-SH requested changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): clear transfer latch after unsafe attempt#24979

fix(proxy): clear transfer latch after unsafe attempt#24979
aptend wants to merge 3 commits into
matrixorigin:mainfrom
aptend:fix/proxy-transfer-intent-stuck

aptend commented Jun 15, 2026

Uh oh!

qodo-code-review Bot commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

XuPeng-SH left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aptend commented Jun 15, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

qodo-code-review Bot commented Jun 15, 2026

Qodo reviews are paused for this user.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

XuPeng-SH left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants