Skip to content

[max_turns] relay: WebSocket heartbeat — RFC 6455 ping/pong every 30s, close at 60s#27

Merged
ilmoniemi merged 5 commits into
mainfrom
feature/7
May 10, 2026
Merged

[max_turns] relay: WebSocket heartbeat — RFC 6455 ping/pong every 30s, close at 60s#27
ilmoniemi merged 5 commits into
mainfrom
feature/7

Conversation

@ilmoniemi
Copy link
Copy Markdown
Contributor

Auto-salvaged from max_turns

The developer agent hit max_turns (71 turns, $6.14) on #7 while work was in progress. The dispatcher auto-committed the uncommitted changes and opened this draft PR for human triage.

Build status at salvage: clean (go vet ./... + go build ./... all passed). Tests were not run as a salvage gate — failing tests are often the signal the agent was chasing.

Last messages from the agent (may include unresolved findings):


To investigate:

This PR is a draft — auto-merge is disabled until a reviewer marks it ready (or closes it). Ticket label error:max_turns_salvaged indicates triage required.

Closes #7

ilmoniemi and others added 3 commits May 9, 2026 23:49
Auto-committed by dispatcher when developer hit max_turns. Build was clean (vet + build); work preserved as draft PR for human triage.

Session: c7fc64e5-8fd9-4291-93c3-62c82b012286
The developer's max_turns salvage on #7 included a `pyrycode-relay`
Mach-O binary because `go build ./...` (a documented salvage gate)
emits it at repo root and `.gitignore` listed `dist/` + `bin/`
containers but not the bare binary name. Add `/pyrycode-relay`
anchored to repo root and `git rm` the existing copy so the PR ships
only source.

Same shape as the canonical `.codegraph` self-ref symlink fix shipped
2026-05-09 in agent-dispatcher's consumers — auto-commit safety net
plus gitignore gap equals unwanted artifact in tree. Two instances
within 24 hours, hardened rule: audit `.gitignore` against every
artifact path the agent's build commands can produce in the working
tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ilmoniemi ilmoniemi marked this pull request as ready for review May 10, 2026 07:05
…exist

Conflicts resolved:
- docs/PROJECT-MEMORY.md: keep #7 heartbeat row + main's three frame-forwarder rows
  (#25 split the old 'Frame forwarding | Not started' row into specific entries)
- docs/knowledge/INDEX.md: keep heartbeat (#7) + phone-forwarder (#25) entries;
  client-endpoint description uses main's post-#25 wording
- docs/knowledge/features/ws-conn-adapter.md: heartbeat policy clarification
  (#7 ships, ping policy lives in heartbeat.go) + main's read-loop/size-cap notes
- docs/knowledge/features/client-endpoint.md (×2): combine — handler hands conn
  to StartPhoneForwarder (#25) AND launches heartbeat goroutine alongside (#7);
  out-of-scope section keeps both #25 inner-frame note and #7 heartbeat policy
- internal/relay/client_endpoint.go: heartbeat goroutine (cancelHB defer first
  under LIFO) launches BEFORE StartPhoneForwarder call replaces the old
  CloseRead+Done placeholder. WSConn is concurrent-safe (closeOnce + writeMu).

Build clean (go vet + go build); tests pass (internal/relay 1.577s).
Combined diff to be reviewed by code-review (security-sensitive label triggers
the security-review pass per architect/security-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ilmoniemi
Copy link
Copy Markdown
Contributor Author

Code Review: #7

Decision: PASS

Findings

None blocking.

Notes (informational)

  • internal/relay/heartbeat_test.go:82teardownOnce is a non-atomic bool rather than sync.Once. Safe in practice because each test calls teardown only via defer from the test goroutine; documenting in case future tests want to call it from multiple sites.
  • internal/relay/heartbeat_test.go:48serverWS is set inside the handler goroutine and read in the test goroutine; the <-handlerReady channel after close(handlerReady) provides happens-before, so this is race-free (and -race confirms). Worth knowing if the helper grows additional shared state later.

Summary

Implementation matches the spec one-to-one. The defer ordering is correct in both endpoints (cancelHB registered after the release/unregister defer → runs first under LIFO → heartbeat exits via <-ctx.Done() before wsconn.Close() fires); runHeartbeat's structure has exactly one CloseWithCode call site gated on ctx.Err() == nil, so AC bullet (c) is enforced structurally. WSConn.Close and WSConn.CloseWithCode correctly share the same closeOnce, which preserves the single-close-frame-on-the-wire invariant.

Security-sensitive checklist: spec carries a PASS security review section (architect's pre-implementation pass); the diff introduces no tokens/secrets/log leaks (close reason is the fixed literal "heartbeat timeout"), no new untrusted-input boundary (heartbeat operates post-claim), bounded concurrency (1 goroutine / accepted upgrade), bounded write rate (server-driven ticker), bounded write size (library-default empty ping payload), and defer cancelHB() is unconditional once the goroutine has been launched.

go vet, go test -race ./..., and make build all pass locally.

Knowledge artefacts (ADR-0007, features/heartbeat.md, INDEX.md and PROJECT-MEMORY.md updates) are present and substantive. The architect's "logging on heartbeat-induced close" open question was deliberately not adopted, with a justification that aligns with the project's evidence-based pacing rule — a good call.

The salvaged pyrycode-relay binary was removed and .gitignore was updated to prevent recurrence; nice to see that closed out as part of the same PR.

- Add docs/knowledge/codebase/7.md (per-ticket implementation summary
  per the convention bootstrapped in #1).
- Revert salvage commit's append to PROJECT-MEMORY.md "What's Built";
  per-ticket files are the new index.
- Patterns established: LIFO defer ordering for per-conn goroutines,
  active-conn vs stillborn-conn close-code split (ADR-0005 / ADR-0007).
- Lessons: nhooyr's Conn.Close 5s close-handshake gates goroutine exit;
  a peer that doesn't read cannot pong (test-shape implication).

Heartbeat feature doc, ADR-0007, INDEX entry, and the three feature
docs (heartbeat, ws-conn-adapter, server-endpoint, client-endpoint)
were already authored by the salvage commit and are unchanged.
@ilmoniemi ilmoniemi merged commit c5e9648 into main May 10, 2026
2 checks passed
@ilmoniemi ilmoniemi deleted the feature/7 branch May 10, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

relay: WebSocket heartbeat — RFC 6455 ping/pong every 30s, close at 60s

1 participant