Skip to content

feat(seo-review): score table, trend arrows, unlimited scan with batching#1349

Merged
felixwidjaja merged 1 commit into
mainfrom
feat/seo-review-score-trends-batching
May 28, 2026
Merged

feat(seo-review): score table, trend arrows, unlimited scan with batching#1349
felixwidjaja merged 1 commit into
mainfrom
feat/seo-review-score-trends-batching

Conversation

@AriaEdo
Copy link
Copy Markdown
Collaborator

@AriaEdo AriaEdo commented May 28, 2026

Summary

Overhauls the seo-review workflow with a deterministic score, week-over-week trends, unlimited page coverage, and Claude batching for large sites.

Score & trends

  • Adds a deterministic score table (overall grade 0–100 + 8 issue tallies) at the top of each weekly issue
  • Adds trend arrows (▲ improved / ▼ worsened / ▬ unchanged) vs the previous weekly audit
  • Previous scores are persisted in a hidden <!-- seo-scores: {...} --> comment inside the issue body — no extra storage required
  • Workflow fetches the latest closed seo-audit issue and passes its body as PREVIOUS_AUDIT_BODY env

Scan coverage

  • Removes MAX_PAGES cap — weekly audit now covers all 92 (and beyond) built pages. The env remains as an opt-in escape hatch when set to a positive int.
  • Workflow drops the MAX_PAGES: '25' env

Claude integration

  • Chunking by char budget: when the report doesn't fit in a single call, splits into batches, runs a "notes-only" prompt per batch, then a final synthesis call merges results into one unified review
  • Safety fallback: if combined batch notes approach the cap, deterministic concat keeps the run robust
  • Prompts now require a 💡 Fix suggestion for every issue (Critical / Improvements / Per-page notes) — output went from 16 → 33 actionable fix lines per weekly run
  • max_tokens raised 2048 → 8192 (output was being silently truncated)
  • Token usage logging — per-call and grand-total stdout lines for cost observability

Refactor (perf + cleanup)

  • Cheerio.load() once per page; $ shared between extractMeta and extractLinksAndRefresh (was parsing twice in full mode)
  • File reads parallelized via Promise.all (was sequential await loop)
  • Cache repeated selector queries in extractMeta
  • Drop dead siteHost plumbing
  • Rename local path shadowing node:path in findRedirectChains

Verification

Smoke-tested three paths against the live dist/client (92 pages):

Path Wall-clock Tokens Output
Single-call (default 640K cap) ~34s in=21,944 out=3,923 (25,867 total) 12.7 KB
Batch + synthesis (forced 50K cap, 2 batches) ~45s in=26,301 out=6,144 (32,445 total) 7.2 KB
Batch + fallback concat (forced 8K cap, 8 batches) ~50s ~80K combined 9.6 KB

Score for current main: 🟡 89/100 — surfaces real issues including a systemic 2-H1 bug across ~55 /handbook/** pages, a typo in /changelog/ og:title ("shippped"), and relative og:image URLs across 8 /brand/* pages.

Issue: #1348

Test plan

  • Trigger workflow manually via workflow_dispatch (mode: full) and verify the issue body contains:
    • Hidden <!-- seo-scores: {...} --> comment
    • ### 📊 Score summary table with grade + 12 metric rows
    • Trend legend ("No previous audit found" on first run; arrows on subsequent runs)
    • AI review section with 💡 Fix: lines per issue
  • Confirm token usage line appears in workflow logs (Token usage — N call(s): ...)
  • Open a PR that modifies an src/pages/** file and verify changed-only mode still posts a sticky PR comment with the score table
  • After the second weekly run, verify trend arrows compare against the previous run

…hing

Score & trends:
- Deterministic score table (0-100 grade + 8 issue tallies) prepended to output
- Trend arrows (▲/▼/▬) vs previous weekly issue, persisted via hidden JSON
  comment <!-- seo-scores: {...} --> embedded in the issue body
- Workflow fetches previous seo-audit issue body and passes as
  PREVIOUS_AUDIT_BODY env for diffing

Scan coverage:
- Remove MAX_PAGES cap (default = unlimited; positive int still acts as
  escape hatch via env)
- Workflow drops MAX_PAGES: '25' env

Claude integration:
- Chunk pages by char budget; >1 batch triggers per-batch "notes-only"
  prompt + a synthesis call that merges results into a unified review
- Fallback to deterministic concat if combined notes approach the cap
- Prompts now require a 💡 Fix suggestion for every issue (Critical /
  Improvements / Per-page notes)
- max_tokens raised 2048 → 8192 (review was being truncated)
- Per-call and grand-total token usage logged to stdout

Refactor (perf + cleanup):
- Cheerio load() once per page, share $ between extractMeta and
  extractLinksAndRefresh (was parsing twice in full mode)
- Parallel file reads via Promise.all (was sequential await loop)
- Cache repeated selector queries in extractMeta
- Drop dead siteHost plumbing
- Rename local `path` shadowing node:path in findRedirectChains
@AriaEdo AriaEdo requested review from felixwidjaja and ronggur May 28, 2026 10:43
@github-actions
Copy link
Copy Markdown
Contributor

🔎 SEO & Meta Review

Skipped (mode: skipped-no-changes).

No CHANGED_FILES provided in changed-only mode.

@felixwidjaja felixwidjaja merged commit 7bd68b0 into main May 28, 2026
6 checks passed
@felixwidjaja felixwidjaja deleted the feat/seo-review-score-trends-batching branch May 28, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants