feat(seo-review): score table, trend arrows, unlimited scan with batching by AriaEdo · Pull Request #1349 · datum-cloud/datum.net

AriaEdo · 2026-05-28T10:41:25Z

Summary

Overhauls the seo-review workflow with a deterministic score, week-over-week trends, unlimited page coverage, and Claude batching for large sites.

Score & trends

Adds a deterministic score table (overall grade 0–100 + 8 issue tallies) at the top of each weekly issue
Adds trend arrows (▲ improved / ▼ worsened / ▬ unchanged) vs the previous weekly audit
Previous scores are persisted in a hidden  comment inside the issue body — no extra storage required
Workflow fetches the latest closed seo-audit issue and passes its body as PREVIOUS_AUDIT_BODY env

Scan coverage

Removes MAX_PAGES cap — weekly audit now covers all 92 (and beyond) built pages. The env remains as an opt-in escape hatch when set to a positive int.
Workflow drops the MAX_PAGES: '25' env

Claude integration

Chunking by char budget: when the report doesn't fit in a single call, splits into batches, runs a "notes-only" prompt per batch, then a final synthesis call merges results into one unified review
Safety fallback: if combined batch notes approach the cap, deterministic concat keeps the run robust
Prompts now require a 💡 Fix suggestion for every issue (Critical / Improvements / Per-page notes) — output went from 16 → 33 actionable fix lines per weekly run
max_tokens raised 2048 → 8192 (output was being silently truncated)
Token usage logging — per-call and grand-total stdout lines for cost observability

Refactor (perf + cleanup)

Cheerio.load() once per page; $ shared between extractMeta and extractLinksAndRefresh (was parsing twice in full mode)
File reads parallelized via Promise.all (was sequential await loop)
Cache repeated selector queries in extractMeta
Drop dead siteHost plumbing
Rename local path shadowing node:path in findRedirectChains

Verification

Smoke-tested three paths against the live dist/client (92 pages):

Path	Wall-clock	Tokens	Output
Single-call (default 640K cap)	~34s	in=21,944 out=3,923 (25,867 total)	12.7 KB
Batch + synthesis (forced 50K cap, 2 batches)	~45s	in=26,301 out=6,144 (32,445 total)	7.2 KB
Batch + fallback concat (forced 8K cap, 8 batches)	~50s	~80K combined	9.6 KB

Score for current main: 🟡 89/100 — surfaces real issues including a systemic 2-H1 bug across ~55 /handbook/** pages, a typo in /changelog/ og:title ("shippped"), and relative og:image URLs across 8 /brand/* pages.

Issue: #1348

Test plan

Trigger workflow manually via workflow_dispatch (mode: full) and verify the issue body contains:
- Hidden  comment
- ### 📊 Score summary table with grade + 12 metric rows
- Trend legend ("No previous audit found" on first run; arrows on subsequent runs)
- AI review section with 💡 Fix: lines per issue
Confirm token usage line appears in workflow logs (Token usage — N call(s): ...)
Open a PR that modifies an src/pages/** file and verify changed-only mode still posts a sticky PR comment with the score table
After the second weekly run, verify trend arrows compare against the previous run

…hing Score & trends: - Deterministic score table (0-100 grade + 8 issue tallies) prepended to output - Trend arrows (▲/▼/▬) vs previous weekly issue, persisted via hidden JSON comment  embedded in the issue body - Workflow fetches previous seo-audit issue body and passes as PREVIOUS_AUDIT_BODY env for diffing Scan coverage: - Remove MAX_PAGES cap (default = unlimited; positive int still acts as escape hatch via env) - Workflow drops MAX_PAGES: '25' env Claude integration: - Chunk pages by char budget; >1 batch triggers per-batch "notes-only" prompt + a synthesis call that merges results into a unified review - Fallback to deterministic concat if combined notes approach the cap - Prompts now require a 💡 Fix suggestion for every issue (Critical / Improvements / Per-page notes) - max_tokens raised 2048 → 8192 (review was being truncated) - Per-call and grand-total token usage logged to stdout Refactor (perf + cleanup): - Cheerio load() once per page, share $ between extractMeta and extractLinksAndRefresh (was parsing twice in full mode) - Parallel file reads via Promise.all (was sequential await loop) - Cache repeated selector queries in extractMeta - Drop dead siteHost plumbing - Rename local `path` shadowing node:path in findRedirectChains

github-actions · 2026-05-28T10:44:31Z

🔎 SEO & Meta Review

Skipped (mode: skipped-no-changes).

No CHANGED_FILES provided in changed-only mode.

AriaEdo requested review from felixwidjaja and ronggur May 28, 2026 10:43

felixwidjaja approved these changes May 28, 2026

View reviewed changes

felixwidjaja merged commit 7bd68b0 into main May 28, 2026
6 checks passed

felixwidjaja deleted the feat/seo-review-score-trends-batching branch May 28, 2026 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(seo-review): score table, trend arrows, unlimited scan with batching#1349

feat(seo-review): score table, trend arrows, unlimited scan with batching#1349
felixwidjaja merged 1 commit into
mainfrom
feat/seo-review-score-trends-batching

AriaEdo commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AriaEdo commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Score & trends

Scan coverage

Claude integration

Refactor (perf + cleanup)

Verification

Test plan

Uh oh!

github-actions Bot commented May 28, 2026

🔎 SEO & Meta Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AriaEdo commented May 28, 2026 •

edited

Loading