jonit-dev · jonit-dev · Mar 22, 2026 · Mar 22, 2026 · Mar 22, 2026
diff --git a/docs/PRDs/analytics-job.md → docs/PRDs/done/analytics-job.md b/docs/PRDs/analytics-job.md → docs/PRDs/done/analytics-job.md
diff --git a/docs/PRDs/fix-executor-streaming-output.md → ...RDs/done/fix-executor-streaming-output.md b/docs/PRDs/fix-executor-streaming-output.md → ...RDs/done/fix-executor-streaming-output.md
@@ -9,6 +9,7 @@
 **Problem:** When the executor launches `claude -p`, it logs "output will stream below" but no output actually streams to the terminal — all output is silently redirected to the log file via `>> "${LOG_FILE}" 2>&1`.
 
 **Files Analyzed:**
+
 - `scripts/night-watch-helpers.sh` — `log()` function (writes ONLY to file)
 - `scripts/night-watch-cron.sh` — provider dispatch (lines 545-577, 624-637)
 - `scripts/night-watch-audit-cron.sh` — provider dispatch (lines 163-189)
@@ -17,6 +18,7 @@
 - `packages/core/src/utils/shell.ts` — `executeScriptWithOutput()` (already streams child stdout/stderr to terminal)
 
 **Current Behavior:**
+
 - `log()` writes ONLY to `LOG_FILE` (`echo ... >> "${log_file}"`) — not to stdout or stderr
 - Provider commands redirect ALL output to file: `claude -p ... >> "${LOG_FILE}" 2>&1`
 - Node's `executeScriptWithOutput()` listens on the bash child's stdout/stderr pipes but receives nothing because the bash script sends everything to the file
@@ -25,11 +27,13 @@
 ## 2. Solution
 
 **Approach:**
+
 - Replace `>> "${LOG_FILE}" 2>&1` with `2>&1 | tee -a "${LOG_FILE}"` for provider dispatch — output goes to both the log file AND stdout (which propagates through Node's pipe to the terminal)
 - Modify `log()` to also write to stderr so diagnostic messages are visible in the terminal during interactive `night-watch run`
 - All scripts already use `set -euo pipefail`, so pipe exit codes propagate correctly (if `claude` fails with code 1 and `tee` succeeds with 0, pipefail returns 1)
 
 **Key Decisions:**
+
 - `tee -a` (append mode) preserves the existing log file behavior
 - Provider output goes to stdout via tee; diagnostic messages go to stderr via log — keeps them on separate channels
 - No changes to `executeScriptWithOutput()` needed — it already streams both pipes to the terminal
@@ -41,6 +45,7 @@
 ### Phase 1: Fix `log()` to also write to stderr + use `tee` for provider output
 
 **Files (5):**
+
 - `scripts/night-watch-helpers.sh` — make `log()` also write to stderr
 - `scripts/night-watch-cron.sh` — replace `>> "${LOG_FILE}" 2>&1` with `2>&1 | tee -a "${LOG_FILE}"` (3 occurrences: main dispatch, codex dispatch, fallback)
 - `scripts/night-watch-audit-cron.sh` — same replacement (2 occurrences)
@@ -50,6 +55,7 @@
 **Implementation:**
 
 - [ ] In `night-watch-helpers.sh`, modify `log()` to also echo to stderr:
+
   ```bash
   log() {
     local log_file="${LOG_FILE:?LOG_FILE not set}"
@@ -82,6 +88,7 @@
 **Pattern for each replacement:**
 
 Before:
+
 ```bash
 if (
   cd "${WORKTREE_DIR}" && timeout "${SESSION_MAX_RUNTIME}" \
@@ -92,6 +99,7 @@ if (
 ```
 
 After:
+
 ```bash
 if (
   cd "${WORKTREE_DIR}" && timeout "${SESSION_MAX_RUNTIME}" \
@@ -102,13 +110,15 @@ if (
 ```
 
 **Exit code behavior with `pipefail`:**
+
 - All scripts use `set -euo pipefail` (line 2)
 - If `timeout ... claude` exits 124 (timeout) and `tee` exits 0 → pipe returns 124 ✓
 - If `timeout ... claude` exits 1 (failure) and `tee` exits 0 → pipe returns 1 ✓
 - If `timeout ... claude` exits 0 (success) and `tee` exits 0 → pipe returns 0 ✓
 - The `if (...); then` construct disables `set -e` for the condition, so non-zero exits are captured correctly
 
 **Rate-limit detection still works:**
+
 - `check_rate_limited` greps the LOG_FILE — `tee -a` still writes everything to the file, so this is unchanged
 
 **Tests Required:**
@@ -120,6 +130,7 @@ if (
 | Manual | Smoke test: `bash -n scripts/night-watch-pr-reviewer-cron.sh` | No syntax errors |
 
 **User Verification:**
+
 - Action: Run `night-watch run` (or trigger executor)
 - Expected: Diagnostic log messages AND claude's streaming output visible in the terminal in real time
 
@@ -139,10 +150,10 @@ if (
 
 ## Files to Modify
 
-| File | Change |
-|------|--------|
-| `scripts/night-watch-helpers.sh` | `log()` also writes to stderr |
-| `scripts/night-watch-cron.sh` | 3× replace `>> LOG 2>&1` with `2>&1 \| tee -a LOG` |
-| `scripts/night-watch-audit-cron.sh` | 2× same replacement |
-| `scripts/night-watch-qa-cron.sh` | 2× same replacement |
-| `scripts/night-watch-pr-reviewer-cron.sh` | 2× same replacement |
+| File                                      | Change                                             |
+| ----------------------------------------- | -------------------------------------------------- |
+| `scripts/night-watch-helpers.sh`          | `log()` also writes to stderr                      |
+| `scripts/night-watch-cron.sh`             | 3× replace `>> LOG 2>&1` with `2>&1 \| tee -a LOG` |
+| `scripts/night-watch-audit-cron.sh`       | 2× same replacement                                |
+| `scripts/night-watch-qa-cron.sh`          | 2× same replacement                                |
+| `scripts/night-watch-pr-reviewer-cron.sh` | 2× same replacement                                |
diff --git a/docs/PRDs/fix-prd-execution-failures.md → docs/PRDs/done/fix-prd-execution-failures.md b/docs/PRDs/fix-prd-execution-failures.md → docs/PRDs/done/fix-prd-execution-failures.md
@@ -14,6 +14,7 @@ PRD execution is failing consistently across projects (night-watch-cli, autopilo
 When the proxy returns 429, the system correctly triggers a native Claude fallback. **However, if native Claude is also rate-limited**, the fallback exits with code 1 and the system records `provider_exit` instead of `rate_limited`.
 
 **Evidence from `logs/executor.log`:**
+
 ```
 API Error: 429 {"error":{"code":"1308","message":"Usage limit reached for 5 hour..."}}
 RATE-LIMITED: Proxy quota exhausted — triggering native Claude fallback
@@ -24,6 +25,7 @@ FAIL: Night watch exited with code 1 while processing 69-ux-revamp...
 ```
 
 **Impact:** The system records a `failure` with `reason=provider_exit` instead of `rate_limited`, which:
+
 - Triggers a long cooldown (max_runtime-based) instead of a rate-limit-appropriate retry
 - Sends misleading failure notifications
 - Prevents the PRD from being retried once the rate limit resets
@@ -33,6 +35,7 @@ FAIL: Night watch exited with code 1 while processing 69-ux-revamp...
 The function scans `tail -50` of the shared `executor.log` file, but log entries from **previous runs** can bleed into the current run's error detail.
 
 **Evidence:** Issue #70's failure detail contains issue #69's error message:
+
 ```
 detail=[2026-03-07 00:40:59] [PID:75449] FAIL: Night watch exited with code 1 while processing 69-ux-revamp...
 ```
@@ -44,6 +47,7 @@ This happens because the log is append-only and `latest_failure_detail()` doesn'
 In filesystem mode, `code-cleanup-q1-2026.md` was selected and executed despite the work already being merged to master. Claude correctly identified the work was done but didn't create a PR. The cron script then recorded `failure_no_pr_after_success`.
 
 **Evidence:**
+
 ```
 OUTCOME: exit_code=0 total_elapsed=363s prd=code-cleanup-q1-2026.md
 WARN: claude exited 0 but no open/merged PR found on night-watch/code-cleanup-q1-2026
@@ -62,6 +66,7 @@ This is a pre-existing filesystem mode issue (stale PRDs not moved to `done/`).
 After the native Claude fallback runs (line ~626), check if the fallback also hit a rate limit before falling through to the generic failure handler.
 
 **Implementation:**
+
 1. After `RATE_LIMIT_FALLBACK_TRIGGERED` block (lines 603-632), if `EXIT_CODE != 0`, scan fallback output for rate-limit indicators (`"hit your limit"`, `429`, `"Usage limit"`)
 2. If detected, set a new flag `DOUBLE_RATE_LIMITED=1`
 3. In the outcome handler (lines 711-726), when `DOUBLE_RATE_LIMITED=1`:
@@ -72,6 +77,7 @@ After the native Claude fallback runs (line ~626), check if the fallback also hi
 **Specific changes in `night-watch-cron.sh`:**
 
 After line 632 (`fi` closing the fallback block), add:
+
 ```bash
 # Detect double rate-limit: both proxy AND native Claude exhausted
 DOUBLE_RATE_LIMITED=0
@@ -84,6 +90,7 @@ fi
 ```
 
 In the outcome handler, add a new branch before the generic `else` on line 711:
+
 ```bash
 elif [ "${DOUBLE_RATE_LIMITED}" = "1" ]; then
   if [ -n "${ISSUE_NUMBER}" ]; then
@@ -102,13 +109,15 @@ elif [ "${DOUBLE_RATE_LIMITED}" = "1" ]; then
 Modify `latest_failure_detail()` to accept an optional `since_line` parameter that filters to only lines written during the current run.
 
 **Implementation:**
+
 1. Change `latest_failure_detail()` (lines 79-92) to accept a second parameter `since_line`
 2. Use `tail -n +${since_line}` instead of `tail -50` when `since_line` is provided
 3. At the call site (line 712), pass the `LOG_LINE_BEFORE` captured at the start of the current attempt
 
 **Specific changes:**
 
 Replace `latest_failure_detail()`:
+
 ```bash
 latest_failure_detail() {
   local log_file="${1:?log_file required}"
@@ -138,6 +147,7 @@ latest_failure_detail() {
 ```
 
 Update call site at line 712:
+
 ```bash
 PROVIDER_ERROR_DETAIL=$(latest_failure_detail "${LOG_FILE}" "${LOG_LINE_BEFORE}")
 ```
@@ -182,6 +192,6 @@ This is already handled — the `code-cleanup-q1-2026.md` issue was a one-time s
 
 ## Files to Modify
 
-| File | Change |
-|------|--------|
+| File                          | Change                                                               |
+| ----------------------------- | -------------------------------------------------------------------- |
 | `scripts/night-watch-cron.sh` | Add double-rate-limit detection, scope failure detail to current run |
diff --git a/docs/PRDs/job-registry.md → docs/PRDs/done/job-registry.md b/docs/PRDs/job-registry.md → docs/PRDs/done/job-registry.md
@@ -7,6 +7,7 @@
 **Problem:** Adding a new job type (e.g., analytics) requires touching 15+ files across 4 packages — types, constants, config normalization, env parsing, CLI command, server routes, API client, Scheduling UI, Settings UI, schedule templates, and more. Each job's state shape is inconsistent (executor/reviewer use top-level flat fields, qa/audit/analytics use nested config objects, slicer lives inside `roadmapScanner`).
 
 **Files Analyzed:**
+
 - `packages/core/src/types.ts` — `JobType`, `IJobProviders`, `INightWatchConfig`, `IQaConfig`, `IAuditConfig`, `IAnalyticsConfig`
 - `packages/core/src/shared/types.ts` — duplicated type definitions for web contract
 - `packages/core/src/constants.ts` — `DEFAULT_*` per job, `VALID_JOB_TYPES`, `DEFAULT_QUEUE_PRIORITY`, `LOG_FILE_NAMES`
@@ -22,6 +23,7 @@
 - `web/store/useStore.ts` — minimal Zustand, no job-specific state
 
 **Current Behavior:**
+
 - 6 job types exist: `executor`, `reviewer`, `qa`, `audit`, `slicer`, `analytics`
 - Executor/reviewer use flat top-level config fields (`cronSchedule`, `reviewerSchedule`, `executorEnabled`, `reviewerEnabled`)
 - QA/audit/analytics use nested config objects (`config.qa`, `config.audit`, `config.analytics`) with common shape: `{ enabled, schedule, maxRuntime, ...extras }`
@@ -32,13 +34,15 @@
 ## 2. Solution
 
 **Approach:**
+
 1. Create a **Job Registry** in `packages/core/src/jobs/` that defines each job's metadata, defaults, config access patterns, and env parsing rules in a single object
 2. Extract a **`IBaseJobConfig`** interface (`{ enabled, schedule, maxRuntime }`) that all job configs extend
 3. Replace per-job boilerplate in `config-normalize.ts` and `config-env.ts` with generic registry-driven loops
 4. Create a **web-side Job Registry** (`web/utils/jobs.ts`) with UI metadata (icons, labels, trigger functions, config accessors) that Scheduling/Settings pages iterate over
 5. Add a **Zustand `jobs` slice** that provides computed job state derived from `status.config` so components don't need to know each job's config shape
 
 **Architecture Diagram:**
+
 ```mermaid
 flowchart TB
     subgraph Core["@night-watch/core"]
@@ -63,12 +67,14 @@ flowchart TB
 ```
 
 **Key Decisions:**
+
 - **Migrate executor/reviewer to nested config**: All jobs will use `config.jobs.{id}: { enabled, schedule, maxRuntime, ...extras }`. Auto-detect legacy flat format and migrate on load. This is a breaking config change but gives uniform access patterns.
 - **Registry is a const array, not DI**: Simple, testable, no runtime overhead
 - **Web job registry stores React components directly** for icons (type-safe, tree-shakeable)
 - **Generic `triggerJob(jobId)`** replaces per-job `triggerRun()`, `triggerReview()` etc. (keep old functions as thin wrappers for backward compat)
 
 **Data Changes:**
+
 - `INightWatchConfig` gains `jobs: Record<JobType, IBaseJobConfig & extras>` — replaces flat executor/reviewer fields and nested qa/audit/analytics objects
 - Legacy flat fields (`cronSchedule`, `reviewerSchedule`, `executorEnabled`, `reviewerEnabled`) and nested objects (`qa`, `audit`, `analytics`, `roadmapScanner.slicerSchedule`) auto-detected and migrated on config load
 - Config file rewritten in new format on first save after migration
@@ -100,6 +106,7 @@ sequenceDiagram
 **User-visible outcome:** Job registry exists and is the single source of truth for job metadata. All constants derived from it. Tests prove registry drives normalization.
 
 **Files (5):**
+
 - `packages/core/src/jobs/job-registry.ts` — **NEW** — `IJobDefinition` interface + `JOB_REGISTRY` array + accessor utilities
 - `packages/core/src/jobs/index.ts` — **NEW** — barrel exports
 - `packages/core/src/types.ts` — add `IBaseJobConfig` interface
@@ -110,21 +117,22 @@ sequenceDiagram
 
 - [ ] Define `IBaseJobConfig` interface: `{ enabled: boolean; schedule: string; maxRuntime: number }`
 - [ ] Define `IJobDefinition<TConfig extends IBaseJobConfig = IBaseJobConfig>` interface with:
+
   ```typescript
   interface IJobDefinition<TConfig extends IBaseJobConfig = IBaseJobConfig> {
     id: JobType;
-    name: string;                    // "Executor", "QA", "Auditor"
-    description: string;             // "Creates implementation PRs from PRDs"
-    cliCommand: string;              // "run", "review", "qa", "audit", "planner", "analytics"
-    logName: string;                 // "executor", "reviewer", "night-watch-qa", etc.
-    lockSuffix: string;              // ".lock", "-r.lock", "-qa.lock", etc.
-    queuePriority: number;           // 50, 40, 30, 20, 10
+    name: string; // "Executor", "QA", "Auditor"
+    description: string; // "Creates implementation PRs from PRDs"
+    cliCommand: string; // "run", "review", "qa", "audit", "planner", "analytics"
+    logName: string; // "executor", "reviewer", "night-watch-qa", etc.
+    lockSuffix: string; // ".lock", "-r.lock", "-qa.lock", etc.
+    queuePriority: number; // 50, 40, 30, 20, 10
 
     // Env var prefix for NW_* overrides
-    envPrefix: string;               // "NW_EXECUTOR", "NW_QA", "NW_AUDIT", etc.
+    envPrefix: string; // "NW_EXECUTOR", "NW_QA", "NW_AUDIT", etc.
 
     // Extra config field normalizers (beyond enabled/schedule/maxRuntime)
-    extraFields?: IExtraFieldDef[];  // e.g., QA's branchPatterns, artifacts, etc.
+    extraFields?: IExtraFieldDef[]; // e.g., QA's branchPatterns, artifacts, etc.
 
     // Defaults
     defaultConfig: TConfig;
@@ -133,6 +141,7 @@ sequenceDiagram
     migrateLegacy?: (raw: Record<string, unknown>) => Partial<TConfig> | undefined;
   }
   ```
+
 - [ ] Create `JOB_REGISTRY` const array with entries for all 6 job types
 - [ ] Create utility functions: `getJobDef(id)`, `getAllJobDefs()`, `getJobDefByCommand(cmd)`
 - [ ] Derive `VALID_JOB_TYPES`, `DEFAULT_QUEUE_PRIORITY`, `LOG_FILE_NAMES` from registry (keep exports stable)
@@ -154,13 +163,15 @@ sequenceDiagram
 **User-visible outcome:** `config-normalize.ts` and `config-env.ts` use generic loops. Legacy flat config auto-migrated. Adding a job config section no longer requires per-job blocks.
 
 **Files (5):**
+
 - `packages/core/src/jobs/job-registry.ts` — add `normalizeJobConfig()` and `buildJobEnvOverrides()` generic helpers
 - `packages/core/src/config-normalize.ts` — replace per-job normalization blocks with registry loop + legacy migration
 - `packages/core/src/config-env.ts` — replace per-job env blocks with registry loop
 - `packages/core/src/config.ts` — `loadConfig()` detects legacy format and migrates in-memory (optionally rewrites file)
 - `packages/core/src/__tests__/config-normalize.test.ts` — verify normalization + migration
 
 **Implementation:**
+
 - [ ] Add `normalizeJobConfig(rawConfig, jobDef)` that reads raw object, applies defaults, validates fields
 - [ ] Each `IJobDefinition` declares `extraFields` for job-specific fields beyond `{ enabled, schedule, maxRuntime }`
 - [ ] Add `migrateLegacyConfig(raw)` that detects old format (e.g., `cronSchedule` exists at top level) and transforms to new `jobs: { executor: { ... }, ... }` shape
@@ -184,17 +195,19 @@ sequenceDiagram
 **User-visible outcome:** Scheduling and Settings pages read job definitions from a registry instead of hardcoded arrays. Zustand provides computed job state.
 
 **Files (4):**
+
 - `web/utils/jobs.ts` — **NEW** — Web-side job registry with UI metadata
 - `web/store/useStore.ts` — add `jobs` computed slice derived from `status.config`
 - `web/api.ts` — add generic `triggerJob(jobId)` function
 - `web/utils/cron.ts` — derive schedule template keys from registry
 
 **Implementation:**
+
 - [ ] Create `IWebJobDefinition` extending core `IJobDefinition` with UI fields:
   ```typescript
   interface IWebJobDefinition extends IJobDefinition {
-    icon: string;              // lucide icon component name
-    triggerEndpoint: string;   // '/api/actions/qa'
+    icon: string; // lucide icon component name
+    triggerEndpoint: string; // '/api/actions/qa'
     scheduleTemplateKey: string; // key in IScheduleTemplate.schedules
     settingsSection?: 'general' | 'advanced'; // where in Settings to show
   }
@@ -223,11 +236,13 @@ sequenceDiagram
 **User-visible outcome:** Scheduling page renders job cards from the registry. Adding a new job automatically shows it in Scheduling.
 
 **Files (3):**
+
 - `web/pages/Scheduling.tsx` — replace hardcoded `agents` array with registry-driven rendering
 - `web/components/scheduling/ScheduleConfig.tsx` — use registry for form fields
 - `web/utils/cron.ts` — update `IScheduleTemplate` to be extensible
 
 **Implementation:**
+
 - [ ] Replace the hardcoded `agents: IAgentInfo[]` array with `WEB_JOB_REGISTRY.map(job => ...)`
 - [ ] Replace `handleJobToggle` if/else chain with generic `job.buildEnabledPatch(enabled, config)`
 - [ ] Replace `handleTriggerJob` map with generic `triggerJob(job.id)`
@@ -247,11 +262,13 @@ sequenceDiagram
 **User-visible outcome:** Settings page job config sections rendered from registry. Adding a new job auto-shows its settings.
 
 **Files (3):**
+
 - `web/pages/Settings.tsx` — replace per-job settings JSX blocks with registry loop
 - `web/components/dashboard/AgentStatusBar.tsx` — use registry for process status
 - `packages/server/src/routes/action.routes.ts` — generate routes from registry
 
 **Implementation:**
+
 - [ ] Settings: iterate `WEB_JOB_REGISTRY` to render job config sections
 - [ ] Each `IWebJobDefinition` can declare its settings fields: `settingsFields: ISettingsField[]`
 - [ ] `AgentStatusBar`: derive process list from registry instead of hardcoded