fix(agent): observe and unblock silent Slack inbox relay drops by posthog[bot] · Pull Request #2772 · PostHog/code

posthog · 2026-06-19T10:19:31Z

Problem

The PostHog team building the Tuesday 2026-06-23 Slack launch is losing personal inbox notifications silently — they just don't arrive, so users assume a quiet inbox rather than a broken pipe ("thought there was something wrong with my inbox being empty"). #2765 root-caused and fixed one cause (the Bash-only PR-detection gate), but a second cause remained and there was zero delivery-side observability to find it.

Why: Notification delivery is posted server-side off the agent relay path and is gated on attributing a freshly created PR to the run. Two mechanisms drop notifications with no observable signal.

Changes

Make relay drops observable. relayAgentResponse previously swallowed every relayMessage failure at debug level and early-returned on a missing message with no signal. It now emits a stably-named slack_inbox_relay telemetry shape capturing the full attempt → delivered/dropped funnel. Drops (relay request failure, no agent message) are warn-level so they surface in agent logs instead of only as user reports.
Unblock attribution on real runs. Widened PR_CREATION_RECENCY_MS from 15 minutes to 24 hours. The 15-minute window failed closed on any run where the PR surfaced later than 15 minutes (multi-turn runs, late/coalesced terminal flushes), so the PR was never attributed and the attribution-gated notification never fired. The window still excludes genuinely stale PRs the agent merely views.

How did you test this?

pnpm --filter @posthog/agent test pr-url-detector — 11 pass, including new coverage that a PR created an hour earlier is still attributed under the default window.
POSTHOG_ALLOW_UNSIGNED_GIT=1 pnpm --filter @posthog/agent test agent-server.test — added 3 relay-telemetry tests (attempt→delivered, warn-level drop on relay failure, drop on missing message); all pass. The 3 pre-existing buildCloudSystemPrompt failures are unrelated (they fail identically on clean main).
pnpm --filter @posthog/agent typecheck and Biome lint on changed files — clean.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

Created with PostHog Code from an inbox report.

Inbox/Slack notifications are posted server-side off the agent relay path, so a silent drop reads to the user as an empty inbox rather than a broken pipe (the reported symptom). This addresses the second, unidentified cause of dropped notifications that #2765 did not fully fix. - Emit stably-named attempt/delivered/dropped telemetry on `relayAgentResponse`, replacing the debug-only logs that swallowed every failure. Drops (relay request failure, no agent message) are now warn-level so they surface in agent logs instead of only as user reports. - Widen `PR_CREATION_RECENCY_MS` from 15 minutes to 24 hours. Notification delivery is gated on attributing a freshly created PR to the run, and the 15-minute window failed closed on any run where the PR surfaced later (multi-turn runs, late/coalesced terminal flushes), so the PR was never attributed and the notification never fired. Generated-By: PostHog Code Task-Id: c18c5b19-a5e4-4c98-95f2-13a68a331e4b

github-actions · 2026-06-19T10:19:57Z

React Doctor found no issues in the changed files. 🎉

_{Reviewed by React Doctor for commit cd04536.}

greptile-apps · 2026-06-19T10:22:27Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/agent/src/server/agent-server.test.ts:1345-1379
**Prefer parameterised tests**

The three `Slack inbox relay telemetry` cases share the same harness (`setup → relayAgentResponse → inspect logger`) and exercise a finite set of `(input, expected outcome)` combinations. Per the project's stated preference, these should be expressed as a single `it.each` block rather than three separate `it` calls. Each row would carry `message`, an optional `relayMessage` factory, and the expected `info`/`warn` assertion data, letting the table surface the full funnel in one glance.

### Issue 2 of 2
packages/agent/src/server/agent-server.ts:2289-2305
**Flush failure loses root cause in the `slack_inbox_relay` telemetry shape**

When `flush` throws, the raw `logger.warn("Failed to flush logs before Slack relay", …)` sits outside the `slack_inbox_relay` structured shape. If the flush failure then causes `getFullAgentResponse` to return `undefined`, the funnel records `dropped: no_agent_message` — hiding the real upstream cause. An engineer querying `telemetry:slack_inbox_relay` will see a `no_agent_message` drop and have no signal that a flush failure preceded it. Passing `reason: "flush_failed"` (and the serialised error) into `recordRelayTelemetry` when the flush throws would keep the root cause inside the queryable shape.

_{Reviews (1): Last reviewed commit: "fix(agent): observe and unblock silent S..." | Re-trigger Greptile}

greptile-apps · 2026-06-19T10:22:32Z

+    it("records attempt then delivered on a successful relay", async () => {
+      const s = setup({ message: "hello" });
+      await s.relayAgentResponse(payload);
+      expect(s.posthogAPI.relayMessage).toHaveBeenCalledWith("t", "r", "hello");
+      expect(outcomesFrom(s.logger.info)).toEqual(["attempt", "delivered"]);
+      expect(s.logger.warn).not.toHaveBeenCalled();
+    });
+
+    it("records a warn-level dropped event when the relay request throws", async () => {
+      const relayMessage = vi.fn(async () => {
+        throw new Error("boom");
+      });
+      const s = setup({ message: "hello", relayMessage });
+      await s.relayAgentResponse(payload);
+      expect(s.logger.warn).toHaveBeenCalledTimes(1);
+      const event = s.logger.warn.mock.calls[0][1] as {
+        outcome: string;
+        reason: string;
+      };
+      expect(event.outcome).toBe("dropped");
+      expect(event.reason).toBe("relay_request_failed");
+    });
+
+    it("records a dropped event when there is no agent message to relay", async () => {
+      const s = setup({ message: undefined });
+      await s.relayAgentResponse(payload);
+      expect(s.posthogAPI.relayMessage).not.toHaveBeenCalled();
+      expect(s.logger.warn).toHaveBeenCalledTimes(1);
+      const event = s.logger.warn.mock.calls[0][1] as {
+        outcome: string;
+        reason: string;
+      };
+      expect(event.outcome).toBe("dropped");
+      expect(event.reason).toBe("no_agent_message");
+    });


Prefer parameterised tests

The three Slack inbox relay telemetry cases share the same harness (setup → relayAgentResponse → inspect logger) and exercise a finite set of (input, expected outcome) combinations. Per the project's stated preference, these should be expressed as a single it.each block rather than three separate it calls. Each row would carry message, an optional relayMessage factory, and the expected info/warn assertion data, letting the table surface the full funnel in one glance.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/server/agent-server.test.ts Line: 1345-1379 Comment: **Prefer parameterised tests** The three `Slack inbox relay telemetry` cases share the same harness (`setup → relayAgentResponse → inspect logger`) and exercise a finite set of `(input, expected outcome)` combinations. Per the project's stated preference, these should be expressed as a single `it.each` block rather than three separate `it` calls. Each row would carry `message`, an optional `relayMessage` factory, and the expected `info`/`warn` assertion data, letting the table surface the full funnel in one glance. **Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-19T10:22:33Z

    try {
      await this.session.logWriter.flush(payload.run_id, { coalesce: true });
    } catch (error) {
-      this.logger.debug("Failed to flush logs before Slack relay", {
+      this.logger.warn("Failed to flush logs before Slack relay", {
        taskId: payload.task_id,
        runId: payload.run_id,
-        error,
+        error: serializeError(error),
      });
    }

    const message = this.session.logWriter.getFullAgentResponse(payload.run_id);
    if (!message) {
-      this.logger.debug("No agent message found for Slack relay", {
-        taskId: payload.task_id,
-        runId: payload.run_id,
+      this.recordRelayTelemetry("dropped", payload, {
+        reason: "no_agent_message",
        sessionRegistered: this.session.logWriter.isRegistered(payload.run_id),
      });
      return;


Flush failure loses root cause in the slack_inbox_relay telemetry shape

When flush throws, the raw logger.warn("Failed to flush logs before Slack relay", …) sits outside the slack_inbox_relay structured shape. If the flush failure then causes getFullAgentResponse to return undefined, the funnel records dropped: no_agent_message — hiding the real upstream cause. An engineer querying telemetry:slack_inbox_relay will see a no_agent_message drop and have no signal that a flush failure preceded it. Passing reason: "flush_failed" (and the serialised error) into recordRelayTelemetry when the flush throws would keep the root cause inside the queryable shape.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/server/agent-server.ts Line: 2289-2305 Comment: **Flush failure loses root cause in the `slack_inbox_relay` telemetry shape** When `flush` throws, the raw `logger.warn("Failed to flush logs before Slack relay", …)` sits outside the `slack_inbox_relay` structured shape. If the flush failure then causes `getFullAgentResponse` to return `undefined`, the funnel records `dropped: no_agent_message` — hiding the real upstream cause. An engineer querying `telemetry:slack_inbox_relay` will see a `no_agent_message` drop and have no signal that a flush failure preceded it. Passing `reason: "flush_failed"` (and the serialised error) into `recordRelayTelemetry` when the flush throws would keep the root cause inside the queryable shape. How can I resolve this? If you propose a fix, please make it concise.

- Fix the failing `quality` check: run biome formatter on the relay telemetry test (the line that broke `biome ci`). - Keep flush failures inside the `slack_inbox_relay` telemetry shape via a new `flush_failed` outcome, and tag the subsequent `no_agent_message` drop with `flushFailed` so a cascaded drop's root cause stays queryable instead of reading as a bare drop (Greptile issue 2). - Express the three relay funnel cases as a parameterised `it.each` table per the project's test convention; keep the flush-cascade case separate since its setup/assertions differ (Greptile issue 1). Generated-By: PostHog Code Task-Id: c18c5b19-a5e4-4c98-95f2-13a68a331e4b

greptile-apps Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): observe and unblock silent Slack inbox relay drops#2772

fix(agent): observe and unblock silent Slack inbox relay drops#2772
posthog[bot] wants to merge 2 commits into
mainfrom
posthog-code/observe-slack-inbox-relay-drops

posthog Bot commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

posthog Bot commented Jun 19, 2026

Problem

Changes

How did you test this?

Automatic notifications

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

github-actions Bot commented Jun 19, 2026 •

edited

Loading