feat(web): markdown content negotiation via Accept header#21
Conversation
- Add `lexicalToMarkdown` utility that converts Payload's Lexical rich-text node tree to a Markdown string (headings, paragraphs, lists, blockquotes, links, images, code blocks, inline formatting) - Add `[lang]/[...path].md.ts` Astro static endpoint that pre-generates a `.md` file with YAML front-matter for every article in every locale at build time - Update `web/vercel.json` to prepend a `has`-based rewrite rule that transparently serves the pre-generated `.md` file to any request whose `Accept` header contains `text/markdown`" Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
- Remove articles-only filter: getStaticPaths now generates .md files
for every page collection (articles, projects, authors, pages)
- Add per-collection frontmatter and richtext extraction:
- projects: body richtext + customer/tags metadata
- authors: description richtext + name/profession metadata
- pages: extract all RichTextBlocks from sections[] with section
titles (##) and subtitles (###) as Markdown headings
- Add richTextToMarkdown() helper to reduce duplication
- Guard project.tags with null-safety and use empty-string join for
contentParts to avoid collapsing existing double newlines
Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
… conversion - Remove `[lang]/[...path].md.ts` Astro endpoint and `lexicalToMarkdown.ts` utility (Lexical-specific approach, limited to certain CMS fields) - Remove has-based markdown rewrite from vercel.json - Add `scripts/html-to-markdown.mjs`: post-build script that scans `.vercel/output/static/` for all HTML pages, extracts `<main>` content, converts it to Markdown via turndown (removing nav/aside/header/footer), prepends YAML frontmatter (title + description from <head>) and saves as .md alongside the .html — generic, works for all current and future pages - Add `src/middleware.ts`: Astro Edge Middleware (edgeMiddleware: true was already set) that rewrites requests carrying `Accept: text/markdown` to `/path.md`; skips known static asset extensions to avoid false rewrites - Add turndown@7.2.2 and node-html-parser@7.1.0 to devDependencies - Update build script: `astro build && node scripts/html-to-markdown.mjs` Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
…ation - Add `web/src/integrations/htmlToMarkdown.ts`: TypeScript Astro integration using the `astro:build:done` hook; uses async fs/promises APIs throughout; uses AstroIntegrationLogger for diagnostics - Register `htmlToMarkdown()` in `web/astro.config.mjs` under `integrations` - Remove `web/scripts/html-to-markdown.mjs` (replaced by integration) - Revert `web/package.json` build script to just `astro build` - Add `@types/turndown@^5.0.6` as devDependency; restore removed `dotenv` Co-authored-by: jhb-dev <20878653+jhb-dev@users.noreply.github.com>
- Replace Astro edge middleware with native Vercel middleware.ts using @vercel/functions rewrite, avoiding upstream bug (withastro/astro#16156) where Astro's edge middleware drops HTTP method and body - Lazy-load TurndownService and node-html-parser inside the build hook to avoid unnecessary work during dev server startup - Improve YAML frontmatter escaping to handle newlines, carriage returns, and tabs in addition to backslashes and quotes - Restore accidentally removed eslint devDependency - Add @vercel/functions as explicit dependency - Remove edgeMiddleware: true from Astro vercel adapter config https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR ReviewPotential BugsRoot path The matcher regex Fix: change Performance Considerations
These run on every matched request in production. The Minor / Style
Everything else looks good
|
|
No root is not a problem since there is no root (only localized). |
ReviewPotential Bug: URL path mismatch between middleware and integrationThe middleware rewrites This only works if Astro is configured with If output is Missing Content-Type response headerThe middleware rewrites to the static Consider adding a
|
- Add <link rel="alternate" type="text/markdown"> to every page's <head> so crawlers/agents can discover the markdown version - Add /llms.txt endpoint listing all pages per locale with markdown negotiation instructions, following the llmstxt.org convention - Reference llms.txt in robots.txt for discoverability https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
- Add Content-Type: text/markdown header for .md files in vercel.json - Remove console.log/warn from middleware (production hot path) - Remove unused method variable https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
PR Review: Markdown Content NegotiationOverall this is a clean, well-structured implementation. A few issues worth addressing: Bugs / Correctness1. Redundant exclusion logic in middleware 2. 3. YAML frontmatter escaping is incomplete 4. Performance5. Sequential file I/O in the build integration 6. Two HTML parsers in the build toolchain Security / Edge Cases7. No file-size guard in the build integration 8. The middleware design rationale (native Vercel vs Astro edge middleware) is well-documented and the upstream bug reference is a helpful trail for future debugging. |
Prevent search engines from indexing the markdown versions of pages and the llms.txt endpoint via X-Robots-Tag: noindex header. https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
ReviewOverall this is a well-structured feature. A few issues worth addressing: BugsTrailing slash → wrong Paths like ```ts Root path The matcher pattern
The PerformanceSequential file I/O in the build hook ( The ```ts Nits
|
Dynamic import() fails in astro:build:done because Vite's module runner is already closed by that point. Move turndown and node-html-parser back to top-level imports. https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
The Vercel adapter copies static files to .vercel/output/static/ before astro:build:done fires, so .md files written only to the Astro output dir are not included in the deployment. Now also processes the Vercel static output directory if it exists. https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q
PR ReviewOverall: Clean implementation. A few items worth addressing before merge. Bugs / Issues1. Fragile YAML escaping (htmlToMarkdown.ts) 2. Possible invalid rel=alternate href (SEOMetadata.astro) PerformanceSequential file I/O in the integration Security / VisibilityStatic .md files are publicly accessible by direct URL Accept header matching is a substring check Minor
|
Code ReviewOverall this is well-structured. A few things worth addressing: Potential BugsInconsistent logging in integration ( Redundant file extension check in middleware ( PerformanceSequential file I/O ( await Promise.all(htmlFiles.map(async (htmlPath) => { ... }))Double processing: The integration runs SecurityNo significant concerns. The YAML frontmatter escaping is adequate for double-quoted values. Minor
|
|
placeholder comment test |
PR Review: feat: markdown content negotiation via Accept headerOverall this is a clean implementation. A few issues worth addressing: Bugs1. Root The index.html to parent-dir .md logic breaks for the site root. For
The middleware would rewrite 2. Pages without When no 3. Query parameters preserved in rewrite (
Performance4. Sequential HTML conversion ( The main loop processes files one at a time with 5. Double conversion for Vercel output The integration converts all HTML twice when the Vercel output dir exists. Converting once and copying the Minor
|
Summary
Adds markdown content negotiation — when a client sends
Accept: text/markdown, the server returns a Markdown version of the page instead of HTML.How it works
web/src/integrations/htmlToMarkdown.ts— Astro build integration that hooks intoastro:build:done. After all static HTML files are emitted, it extracts<main>content, converts it to Markdown viaturndown, prepends YAML frontmatter (title + description), and writes a.mdfile alongside each.htmlfile.web/middleware.ts— Native Vercel middleware using@vercel/functionsrewrite(). WhenAccept: text/markdownis present, rewrites the request path to the.mdfile. Excludes/api,/preview,/_astro,/assets,/favicon, and paths with file extensions.Usage
Test plan
pnpm buildinweb/succeeds and.mdfiles are generated alongside.htmlcurl -H "Accept: text/markdown" <url>returns markdown/preview,/api, and static assets are excluded from rewritingpnpm lintandpnpm checkpass inweb/Closes #15, supersedes #16
https://claude.ai/code/session_01CwURAMew6D9fXyr4kA1w2Q