Skip to content

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53

Draft
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-tuning
Draft

Add monitor tuning meta-guide for run-volume and false-positive avoidance#53
samgutentag wants to merge 1 commit into
mainfrom
sam-gutentag/monitor-tuning

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

  • New page: flaky-tests/detection/tuning-monitors.mdx
  • Adds a docs.json nav entry under the Flaky test detection group, slotted after the three monitor-type pages
  • Ties together run-volume → monitor-type recommendations, single-failure-flap avoidance, branch coverage, recovery vs activation, monitor states (active / inactive / disabled), and a pre-auto-quarantine checklist

Why

Sourced from customer feedback mining (cluster monitor-tuning-thresholds, verdict partial + first-class IA candidate, 15 pairs across 7 customers). The individual monitor pages already document each monitor type. Customers consistently ask the same set of system-level tuning questions — when to use failure-count vs failure-rate, how to avoid single-failure flips, why a monitor scoped to main misses queue-branch failures, what "inactive" means in the UI, what to check before turning on auto-quarantine.

Items flagged for review

  • Page location. Slotted under flaky-tests/detection/ rather than flaky-tests/management/ because the page is about tuning detection behavior, not managing already-detected tests. The cluster suggestion mentioned either location; this felt cleaner since every link inside the page points at detection pages. Confirm or move.
  • Auto-quarantine recommended window: "1-3 days." Lifted from the cluster Q&A (Caseware thread). Confirm this still matches current eng guidance.
  • Pass-on-Retry default recovery = 7 days, range 1-15. Pulled from pass-on-retry-monitor.mdx and matches the cluster Gusto thread.
  • Branch patterns table (Trunk Merge Queue / GitHub Merge Queue / Graphite Merge Queue) mirrors the table in failure-rate-monitor.mdx. GitLab Merge Trains intentionally omitted since the cluster didn't surface a question about them — failure-rate-monitor.mdx notes they run on the target branch directly.
  • The "gap" section explicitly calls out that there's no way to distinguish "flakes detected in MQ" from "bad PR in MQ" at the monitor level, and proposes a >=2 failures in 1h failure-count threshold on queue branches as a proxy. This came directly from the Gusto thread reply ("Higher-threshold failure count monitor that marks broken is the right pattern... No good way to distinguish flakes-detected-in-MQ from actual-bad-PRs-in-MQ today."). Confirm the proxy guidance is still accurate.
  • "Inactive" state definition. Cluster note said "Copy will be improved" in the UI — the doc currently defines it as "previously triggered, no longer triggered, still enabled." Confirm this matches the latest UI state and whether the copy change has shipped.
  • Pre-auto-quarantine cross-link points at ../agents/autofix-flaky-tests. That page exists but its content is more about the auto-investigation/PR flow than the auto-quarantine toggle. If there's a better target page for the auto-quarantine setting itself, swap it.

Customer signal

@samgutentag samgutentag added the needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge label May 20, 2026
@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trunk 🟢 Ready View Preview May 20, 2026, 11:05 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 26, 2026

Verification status (2026-05-29): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Flag state: LaunchDarkly not consulted per flag (no eng PR, no flag to read). LD MCP was reachable this sweep.
  • Eng PR: none referenced in PR body
  • Flag: n/a (no eng work to verify)
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body
    • No TRUNK-XXX Linear ticket linked
    • PR scope is content (meta-guide for tuning existing flaky test monitors), not a flag-gated new feature

This is a content-accuracy PR documenting existing behavior. The code-verification comment found one contradiction: Pass-on-Retry recovery range documented as "1-15" but code enforces 1-14. Correct before merge. No rollout dependency to wait on.

@samgutentag samgutentag marked this pull request as draft May 26, 2026 18:39
@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 26, 2026

Code verification (2026-06-01): 1 contradicted (unresolved from prior sweep)

Claim Verdict Source
Pass-on-Retry recovery days range "1-15" contradicted flake-detection.ts

The earlier code-verification findings still hold. Re-checked this sweep: the recovery-days range remains stated as "1-15" in the diff, but source still enforces a maximum of 14. The frontend validation throws Recovery days must be between 1 and 14 for recoveryDays > 14. Correct the doc to "range 1-14" before merge.

All other claims from the prior sweep remain confirmed (recovery_days default 7, trunk-merge/* pattern, resolution-vs-activation threshold separation, failure-count resolve_after_minutes, stale timeout, auto-quarantine excluding broken tests, active/inactive monitor states). The two graphite-merge/* and minimum-sample-size items remain unverifiable against Trunk source (Graphite-defined / behavioral), not contradictions.


Source #1 — Pass-on-Retry recovery days max is 14, not 15 (contradicted)

File: trunk-io/trunk2/ts/apps/frontend/src/lib/services/flake-detection.ts

if (recoveryDays < 1 || recoveryDays > 14) {
  throw new Error("Recovery days must be between 1 and 14");
}

Reasoning: The validation rejects any value above 14, so the valid range is 1-14, not 1-15. The doc's "range 1–15" is off by one at the top of the range.

@samgutentag samgutentag added the needs eng review verify-docs-against-code: at least one claim contradicts source. label May 26, 2026
Copy link
Copy Markdown
Member Author

Verification status (May 31, 2026): unknown

Editorial monitor-tuning content, but internal content contradictions prevent confident auto-verification.

  • Flag state: not consulted
  • Eng PR: none found
  • Flag: none
  • Signals: content appears to describe existing behavior, but review flagged contradictions requiring human check before publish

Next action: author or reviewer to resolve content contradictions and confirm accuracy, then merge.


Generated by Claude Code

Copy link
Copy Markdown
Member Author

samgutentag commented Jun 1, 2026

Verification status (2026-06-01): unknown

Could not determine rollout state from available signals. Chaining to verify-docs-against-code for content-accuracy check.

  • Flag state: LaunchDarkly not consulted (no eng PR, no flag to read).
  • Eng PR: none referenced in PR body.
  • Flag: n/a (no eng work to verify).
  • Signals checked:
    • No trunk-io/<repo>#NNN or PR URL references in PR body.
    • No TRUNK-XXX Linear ticket linked.
    • Scope is a content meta-guide tying together existing monitor-tuning behavior, not a flag-gated new feature.

This is a content-accuracy PR. The publish gate is the open content contradiction tracked in the verify-docs-against-code comment (recovery-days range). PR is currently CONFLICTING (needs rebase); does not affect this verdict.

Unchanged from prior sweep: still unknown.


Generated by Claude Code

Copy link
Copy Markdown
Member Author

Verification status: unknown - June 2, 2026

Could not determine rollout state from available signals.

  • Flag state: LaunchDarkly not consulted; no feature flag identified
  • Eng PR links: None
  • Flag: none
  • Signals checked: PR body; no eng refs to verify
  • Suggested next action: This covers existing monitor tuning behavior. Confirm all referenced behaviors (inactive monitor state definition, branch-list patterns, auto-quarantine recommended window) match current production. Human review needed before publishing.

Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs eng review verify-docs-against-code: at least one claim contradicts source. needs review PR sourced from customer-feedback-mining; needs human scrutiny for accuracy before merge

Development

Successfully merging this pull request may close these issues.

1 participant