feat(llms): replace MDX source stripping with HTML -> Markdown pipeline for .md generation#339
Open
viktorkombov wants to merge 17 commits into
Open
feat(llms): replace MDX source stripping with HTML -> Markdown pipeline for .md generation#339viktorkombov wants to merge 17 commits into
viktorkombov wants to merge 17 commits into
Conversation
…inks to absolute URLs
- Introduced a new module `html-to-md.ts` for converting built Astro HTML pages to Markdown. - Integrated `turndown` and `turndown-plugin-gfm` for Markdown formatting. - Updated `package.json` to include new dependencies for Markdown conversion. - Modified `integration.ts` to utilize the new HTML to Markdown conversion, replacing the previous MDX stripping logic. - Enhanced logging for Markdown generation process and added selector guard for code blocks.
- generate Markdown from rendered HTML - create full, abridged, and topic-specific LLM bundles - improve Unicode, code fence, link, and metadata handling - add conversion diagnostics and update related documentation
…github.com/IgniteUI/docs-template into vkombov/convert-api-links-to-md-in-llms-md
ChronosSF
reviewed
Jun 23, 2026
Description fixes in MDX source files:
- Replace dash-prefixed API-list and changelog-fragment llms.description
values with proper prose sentences (6 files: map-api EN/xplat, slider-ticks
EN/JP, general-changelog-dv-react/wc JP)
- Remove stray trailing `{` from 14 JP llms.description fields (partial
copy-paste of a template token)
- Fix AI toolchain description in xplat JP to use {ProductName} instead of
hardcoded platform list (was incorrect for Blazor JP)
- Fix toc.json typo: チャートのテータ注釈 → チャートのデータ注釈
llms.ts / buildLlmsTxt localization:
- Add JP navigation-bucket labels to IGDOCS_BROAD_SECTIONS so 概要 and
other JP toc headers are stripped from label prefixes (fixes "概要 ... Overview"
double-label in angular-jp llms.txt)
- Add LLMS_TXT_STRINGS map with JP translations for the Documentation sets
section heading and the two built-in doc-set links
- Add navLang and localizedDescription parameters to buildLlmsTxt so JP
builds emit localized header blockquote and section labels
integration.ts / astro configs:
- Thread localizedDescription through CreateDocsSiteOptions → createDocsSite
→ siteMetaIntegration → buildLlmsTxt
- Supply JP localizedDescription in docs/angular/astro.config.ts and
docs/xplat/astro.config.ts (per-platform JP descriptions)
- Make diagnosticSourceCandidates async (fsp.access instead of fs.existsSync)
and replace hardcoded grid-type regex with generic segment-based lookup
…tion references (#354) * Fix malformed tags, platform inconsistencies, and incorrect documentation references * revert not needed change in igniteui licensing * Revert not needed changes * Additional fixed code snippets * Resolve comments from code review
…o vkombov/convert-api-links-to-md-in-llms-md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #338
Replaces the previous MDX-stripping approach (
stripMdxForLlms) with a fullHTML→Markdown conversion pipeline that operates on the already-rendered Astro
output. Every
.mdfile is now generated from the final rendered HTML — withall JSX components resolved, ApiLinks pointing to real URLs, platform-specific
content inlined, and Shiki syntax highlighting stripped to raw code.
Why HTML→Markdown
The previous approach worked on MDX source files and required reverse-engineering
what each component renders to. Each new component or edge case (code fences,
DocsAside callouts, inline
<br>,<div>wrappers) meant another fragile regex.Working on the rendered HTML eliminates that class of problem entirely: the
browser-facing content is the source of truth.
Changes
New:
src/html-to-md.tsbuildHtmlToMdConverter()— configured Turndown + GFM tables, created onceper build and shared across all pages for performance
htmlPageToMd(htmlPath, siteUrl, td, sourceRef)— converts a single builtHTML file to Markdown; returns
''when the file is missing or has no contenttoken-span soup never reaches Turndown and code content is never mutated
> **Info:**)[title](url)links (fixes empty "And the result is:" seams)TYPOGRAPHIC_MAP— converts curly quotes, dashes,NBSP, narrow no-break space (
\u202F),®,™etc. to ASCII withoutromanizing CJK characters (preserving Japanese/Korean builds)
<meta charset>stripped before JSDOM to prevent double-encoding mojibake.mdextension for LLM-to-LLM navigationsourceRefparameter: warnings point at the source.mdxfile, not thebuilt
.htmlartifactUpdated:
src/integration.tsgenerateLlmsMdFiles()extracted with typedGenerateLlmsMdOptionsinterfacewithUtf8Bom()utility — idempotent BOM prepend, enabled for non-Englishbuilds (
navLang !== 'en') so static preview servers render JP/KR correctlyexitCode 1when Shiki selectors stop matching or >10% of pages produce empty output
llms-small.txtfence regex uses backreference to handle nested Markdown correctlyAcceptance checklist
.mdfiles> **Info:** / > **Warning:**blockquotes[title](demo-url)links where iframes were.mdsuffix