fix(ec2-worker): post-launch EIP allocate+associate on multi-ENI launches#82
Conversation
…ches PR #79's ANYSCAN_EC2_ASSOCIATE_PUBLIC_IP=true on a multi-ENI launch is hard-rejected by AWS: InvalidParameterCombination — The associatePublicIPAddress parameter cannot be specified when launching with multiple network interfaces. AWS only honors AssociatePublicIpAddress on NetworkInterfaces[] when exactly one entry is supplied, even if the field appears only on the primary entry of a multi-NIC payload. The entire RunInstances call fails. Reported in PR #65 issuecomment-4339242358 (anygpt-52). Fix: when len(NetworkInterfaces) > 1, suppress the field inline and allocate-address + associate-address on the primary ENI post-launch. The recreate path now also releases the previously-recorded EIP before terminating the old instance so we don't leak Elastic IPs on every recreate. - build_network_interfaces only emits AssociatePublicIpAddress when the resulting payload is single-NIC (target_count == 1). - Ec2WorkerManager._associate_public_ip_post_launch allocates an EIP (Domain=vpc), associates it with the primary ENI (DeviceIndex=0), records AllocationId/AssociationId in self.state. Allocate or associate failures are surfaced in eni_attach.public_ip but do not abort the recreate — the worker is still usable on private IPs. - Ec2WorkerManager._release_recorded_eip disassociates and releases any previously-recorded EIP at the start of recreate_instance. Tests: - New: launch payload free of AssociatePublicIpAddress on multi-ENI; allocate_address + associate_address called post-launch with the primary ENI's NetworkInterfaceId; allocation_id persisted. - New: AllocateAddress failure does not abort recreate. - New: AssociateAddress failure still records AllocationId so the next recreate can release it. - New: previously-recorded EIP is disassociated + released before terminating the old instance on the next recreate. - Updated: prior tests that asserted the broken inline-flag behavior on multi-NIC now assert the field is suppressed everywhere. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb2cc3d3a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| self.ec2.disassociate_address(AssociationId=association_id) | ||
| except botocore.exceptions.ClientError as exc: | ||
| disassoc_error = { |
There was a problem hiding this comment.
Catch non-ClientError exceptions in EIP release path
_release_recorded_eip is documented to log cleanup errors without blocking recreate, but both AWS calls only catch botocore.exceptions.ClientError. Transient transport failures (for example EndpointConnectionError/ReadTimeoutError) will escape this method, and run_once only handles ClientError, so the daemon can crash before terminating/relaunching the worker. This makes recreate fragile exactly when AWS/network is degraded.
Useful? React with 👍 / 👎.
| "public_ip": public_ip, | ||
| } | ||
| association_id = assoc.get("AssociationId") | ||
| self.state["eip_allocation_id"] = allocation_id |
There was a problem hiding this comment.
Preserve old allocation ID when prior EIP release failed
If _release_recorded_eip fails to release the previously recorded EIP, recreate_instance still proceeds and _associate_public_ip_post_launch overwrites self.state["eip_allocation_id"] with the new allocation. That drops the only pointer to the unreleased old allocation, so the next recreate cannot clean it up and the old EIP leaks permanently in this failure scenario.
Useful? React with 👍 / 👎.
Summary
PR #79's
ANYSCAN_EC2_ASSOCIATE_PUBLIC_IP=trueon a multi-ENI launch is hard-rejected by AWS:AWS only honors
AssociatePublicIpAddressonNetworkInterfaces[]when exactly one entry is supplied, even if the field appears only on the primary entry of a multi-NIC payload. The entire RunInstances call fails. Reported in PR #65 issuecomment-4339242358 (anygpt-52).Fix: when
len(NetworkInterfaces) > 1, suppress the field inline andallocate-address+associate-addresson the primary ENI post-launch. The recreate path now also releases the previously-recorded EIP before terminating the old instance so we don't leak Elastic IPs on every recreate.Changes
build_network_interfacesonly emitsAssociatePublicIpAddresswhen the payload is single-NIC (target_count == 1).Ec2WorkerManager._associate_public_ip_post_launchallocates an EIP (Domain=vpc), associates it with the primary ENI (DeviceIndex=0), recordsAllocationId/AssociationIdinself.state. Failures surface ineni_attach.public_ipbut do not abort the recreate — the worker is still usable on private IPs.Ec2WorkerManager._release_recorded_eipdisassociates + releases any previously-recorded EIP at the start ofrecreate_instance.Test plan
python3 -m unittest tools.test_ec2_worker_manager -v— 57 tests passAssociatePublicIpAddresson multi-ENI;allocate_address+associate_addresscalled post-launch with the primary ENI'sNetworkInterfaceId; allocation_id persisted in state.AllocateAddressfailure does not abort recreate (instance still launched, error surfaces ineni_attach.public_ip).AssociateAddressfailure still recordsAllocationIdso the next recreate can release it.🤖 Generated with Claude Code