Skip to content

activator/client: handle device full when tunnel IDs exhausted#3266

Open
martinsander00 wants to merge 4 commits intomainfrom
martin/cli-error-device-full
Open

activator/client: handle device full when tunnel IDs exhausted#3266
martinsander00 wants to merge 4 commits intomainfrom
martin/cli-error-device-full

Conversation

@martinsander00
Copy link
Contributor

Summary of Changes

  • Cap tunnel ID allocation per device (500–627) so the activator gracefully rejects users when a device is full instead of allocating invalid tunnel IDs
  • Update the CLI to stop retrying on rejection and surface the rejection reason as an error, so operators see a clear "device full" message instead of hanging indefinitely

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 5 +37 / -16 +21
Tests 1 +48 / -7 +41

Majority of new lines are tests for the bounded allocator; core logic is compact.

Key files (click to expand)
  • activator/src/idallocator.rs — add with_max constructor and Option<u16> return from next_available; 3 new test cases for bounded allocation
  • client/doublezero/src/command/connect.rs — stop polling on Rejected status; return rejection reasons as an error instead of silently succeeding
  • activator/src/process/user.rs — handle None from tunnel ID allocation by rejecting the user with a descriptive onchain log message
  • activator/src/states/devicestate.rs — cap tunnel IDs at 627 via IDAllocator::with_max
  • activator/src/process/iface_mgr.rs — unwrap Option with descriptive panic for segment routing IDs
  • activator/src/process/link.rs — unwrap Option with descriptive panic for link IDs

Testing Verification

  • All 13 IDAllocator unit tests pass, including 3 new tests covering bounded allocation (upper bound respected, gap filling, exhaustion)
  • CLI rejection path returns Err with reason detail, ensuring doublezero connect exits non-zero on device-full rejection

Add max capacity to IDAllocator (tunnel IDs 500-627) so the activator
gracefully rejects users when a device runs out of tunnel IDs instead
of allocating beyond the valid range. Update the CLI to detect rejection
status during provisioning and surface the rejection reason as an error.
Comment on lines +53 to 54
tunnel_ids: IDAllocator::with_max(500, 499 + device.max_users, vec![]),
tunnel_endpoints_in_use: HashMap::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work if it's done at start time. If a device is currently drained (i.e. max users = 0), activator restarts then the contributor changes max users to 128, the tunnel ID allocator still still be at 0.

Either leaving the implementation as is and relying on Steven's fix to deallocated tunnel IDs properly or if activate requests are always serialized, using onchain state only to allocate the next tunnel both seem like better fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants