Skip to content

fix(ec2-worker): post-launch EIP allocate+associate on multi-ENI launches#82

Merged
skullcrushercmd merged 1 commit intomainfrom
fix/ec2-multi-eni-public-ip-post-launch
Apr 28, 2026
Merged

fix(ec2-worker): post-launch EIP allocate+associate on multi-ENI launches#82
skullcrushercmd merged 1 commit intomainfrom
fix/ec2-multi-eni-public-ip-post-launch

Conversation

@skullcrushercmd
Copy link
Copy Markdown
Contributor

Summary

PR #79's ANYSCAN_EC2_ASSOCIATE_PUBLIC_IP=true on a multi-ENI launch is hard-rejected by AWS:

InvalidParameterCombination — The associatePublicIPAddress parameter cannot be specified when launching with multiple network interfaces.

AWS only honors AssociatePublicIpAddress on NetworkInterfaces[] when exactly one entry is supplied, even if the field appears only on the primary entry of a multi-NIC payload. The entire RunInstances call fails. Reported in PR #65 issuecomment-4339242358 (anygpt-52).

Fix: when len(NetworkInterfaces) > 1, suppress the field inline and allocate-address + associate-address on the primary ENI post-launch. The recreate path now also releases the previously-recorded EIP before terminating the old instance so we don't leak Elastic IPs on every recreate.

Changes

  • build_network_interfaces only emits AssociatePublicIpAddress when the payload is single-NIC (target_count == 1).
  • Ec2WorkerManager._associate_public_ip_post_launch allocates an EIP (Domain=vpc), associates it with the primary ENI (DeviceIndex=0), records AllocationId / AssociationId in self.state. Failures surface in eni_attach.public_ip but do not abort the recreate — the worker is still usable on private IPs.
  • Ec2WorkerManager._release_recorded_eip disassociates + releases any previously-recorded EIP at the start of recreate_instance.

Test plan

  • python3 -m unittest tools.test_ec2_worker_manager -v — 57 tests pass
  • New unit test: launch payload free of AssociatePublicIpAddress on multi-ENI; allocate_address + associate_address called post-launch with the primary ENI's NetworkInterfaceId; allocation_id persisted in state.
  • New unit test: AllocateAddress failure does not abort recreate (instance still launched, error surfaces in eni_attach.public_ip).
  • New unit test: AssociateAddress failure still records AllocationId so the next recreate can release it.
  • New unit test: previously-recorded EIP is disassociated + released before terminating the old instance on the next recreate.
  • Updated tests that asserted the broken inline-flag behavior on multi-NIC.

🤖 Generated with Claude Code

…ches

PR #79's ANYSCAN_EC2_ASSOCIATE_PUBLIC_IP=true on a multi-ENI launch is
hard-rejected by AWS:

  InvalidParameterCombination — The associatePublicIPAddress parameter
  cannot be specified when launching with multiple network interfaces.

AWS only honors AssociatePublicIpAddress on NetworkInterfaces[] when
exactly one entry is supplied, even if the field appears only on the
primary entry of a multi-NIC payload. The entire RunInstances call
fails. Reported in PR #65 issuecomment-4339242358 (anygpt-52).

Fix: when len(NetworkInterfaces) > 1, suppress the field inline and
allocate-address + associate-address on the primary ENI post-launch.
The recreate path now also releases the previously-recorded EIP
before terminating the old instance so we don't leak Elastic IPs on
every recreate.

- build_network_interfaces only emits AssociatePublicIpAddress when
  the resulting payload is single-NIC (target_count == 1).
- Ec2WorkerManager._associate_public_ip_post_launch allocates an EIP
  (Domain=vpc), associates it with the primary ENI (DeviceIndex=0),
  records AllocationId/AssociationId in self.state. Allocate or
  associate failures are surfaced in eni_attach.public_ip but do not
  abort the recreate — the worker is still usable on private IPs.
- Ec2WorkerManager._release_recorded_eip disassociates and releases
  any previously-recorded EIP at the start of recreate_instance.

Tests:
- New: launch payload free of AssociatePublicIpAddress on multi-ENI;
  allocate_address + associate_address called post-launch with the
  primary ENI's NetworkInterfaceId; allocation_id persisted.
- New: AllocateAddress failure does not abort recreate.
- New: AssociateAddress failure still records AllocationId so the
  next recreate can release it.
- New: previously-recorded EIP is disassociated + released before
  terminating the old instance on the next recreate.
- Updated: prior tests that asserted the broken inline-flag behavior
  on multi-NIC now assert the field is suppressed everywhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skullcrushercmd skullcrushercmd merged commit 0283ad7 into main Apr 28, 2026
@skullcrushercmd skullcrushercmd deleted the fix/ec2-multi-eni-public-ip-post-launch branch April 28, 2026 21:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb2cc3d3a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1061 to +1064
try:
self.ec2.disassociate_address(AssociationId=association_id)
except botocore.exceptions.ClientError as exc:
disassoc_error = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Catch non-ClientError exceptions in EIP release path

_release_recorded_eip is documented to log cleanup errors without blocking recreate, but both AWS calls only catch botocore.exceptions.ClientError. Transient transport failures (for example EndpointConnectionError/ReadTimeoutError) will escape this method, and run_once only handles ClientError, so the daemon can crash before terminating/relaunching the worker. This makes recreate fragile exactly when AWS/network is degraded.

Useful? React with 👍 / 👎.

"public_ip": public_ip,
}
association_id = assoc.get("AssociationId")
self.state["eip_allocation_id"] = allocation_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve old allocation ID when prior EIP release failed

If _release_recorded_eip fails to release the previously recorded EIP, recreate_instance still proceeds and _associate_public_ip_post_launch overwrites self.state["eip_allocation_id"] with the new allocation. That drops the only pointer to the unreleased old allocation, so the next recreate cannot clean it up and the old EIP leaks permanently in this failure scenario.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant