[v2.0.39] Add native Typst math output format (LaTeX/MathML → Typst)#405
Draft
OlgaRedozubova wants to merge 51 commits intomasterfrom
Draft
[v2.0.39] Add native Typst math output format (LaTeX/MathML → Typst)#405OlgaRedozubova wants to merge 51 commits intomasterfrom
OlgaRedozubova wants to merge 51 commits intomasterfrom
Conversation
39b65d8 to
d81d7f4
Compare
Implement SerializedTypstVisitor that converts MathJax's internal MathML
tree into native Typst math syntax, enabling direct LaTeX → Typst and
MathML → Typst conversion.
Key features:
- Full token handling: mi, mo, mn, mtext, mspace with font variants,
operator detection, and context-aware spacing
- Script/limit constructs: msub, msup, msubsup, munder, mover,
munderover, mmultiscripts with movablelimits-aware placement
- Structural elements: mfrac, msqrt, mroot, mtable (matrix/cases/
equation arrays with alignment), mfenced, menclose, mphantom
- Delimiter handling: paired/unpaired bracket detection via pre-serialization
tree walk, lr() wrapping, abs/norm/floor/ceil shorthand with separator
fallback to lr()
- Equation numbering: auto-numbered and \tag{} equations, numcases/
subnumcases grid layout with per-row counters and labels
- Symbol mapping: 500+ Unicode → Typst symbol mappings including arrows,
relations, accents, Greek, operators, geometry, and suits
- Escape handling: unified scanner for comma/semicolon/colon escaping
in function calls, string literal skipping, bracket depth tracking
- Dual output: block (typst) and inline (typst_inline) variants
- Context menu integration for copying Typst math output
Architecture:
- Modular handler files: token-handlers, script-handlers,
structural-handlers, table-handlers
- Shared utilities: common.ts, consts.ts, types.ts, escape-utils.ts,
bracket-utils.ts, typst-symbol-map.ts
- Strict TypeScript typing throughout, no any casts
Extract serializeThousandSepChain into common.ts, replacing duplicated chain logic in index.ts and structural-handlers.ts. Add tree mutation order comment, forLatex/include_typst comments, and remove test logging.
…content spacing
- isDerivativePattern now checks for actual prime chars (′ ″ ‴) instead of any mo
- Move SCRIPT_NODE_KINDS and PRIME_CHARS to consts.ts, remove duplicate SCRIPT_PARENT_KINDS
- Revert post-content loop to needsTokenSeparator (fixes tau_(i,j)(t) regression)
- Add comment about prevNode in \left...\right delimiter handling
- Add test for f^{(n)}(a) → f^((n))(a) (TeXAtom derivative, no space)
- \left.\right\} now produces lr(mat(delim: #none, ...) \}) instead of losing the brace - Mismatched pairs like \left[\right) use lr() wrapping instead of mat(delim:) - Matched pairs (same char or standard open→close) still use compact mat(delim: ...) - Add tests for \left.\right\} in align* and mismatched \left[\right) on array
Two MathJax patterns for \not: - Overlay: TeXAtom(REL) > mpadded[width=0] > mtext(⧸) — detected in mrow/inferredMrow loops, next sibling wrapped in cancel() - Combining char: U+0338 appended to mi/mo text — stripped and wrapped in cancel() in token handlers - Fix cancel() loss on early returns in mo handler (multiword/namedOp) - Add tests for \not 7,60 and \not k + \not q
d81d7f4 to
f673fb3
Compare
…t-handling API - Add two-pass escapeUnpairedBrackets (reuses scanBracketTokens + findUnpairedIndices) - Integrate into escapeContentSeparators for all function-call arguments - Integrate replaceUnpairedBrackets into escapeCasesSeparators for consistent API - Remove manual replaceUnpairedBrackets calls from table-handlers - Add unit tests for escape-utils
… lr() The mrow handler incorrectly delegated to mtable when ANY child was a table, even when other content (arrows, operators) sat alongside it. Now hasTableChild is true only when mtable is the sole content child. The mtable handler also checks parent delimiters only for sole-content case, preventing double lr() wrapping. Extract getContentChildren into common.ts and containsTable as a standalone helper to eliminate duplication between the two handlers.
…call parsing In Typst math mode, identifier( is parsed as a function call (see typst/typst#7274). Insert a space before ( when the preceding token is a multi-char name not in TYPST_BUILTIN_OPS (e.g. emptyset, sigma, Gamma, psi). Single-char identifiers (f, g) and built-in operators (sin, cos, ln, arg) keep no space.
…rsing Add escapeColon to escapeContentSeparators so word: inside any Typst function call becomes word : (space prevents named-arg syntax). Apply escapeContentSeparators to abs(), norm(), floor(), ceil() content which previously had no escaping.
…content
- Extract resolveDelimiterMo helper to access texClass on delimiter nodes
- Reject ‖ pairing when opener has CLOSE texClass (surrounding pair context)
- Reject ‖ pairing when content contains PUNCT (comma between standalone ‖)
- Reject ‖ pairing spanning entire row when content has REL operator (=)
- Apply escapeContentSeparators to bare delimiter func-call content (norm,
floor, ceil) to prevent commas/semicolons/colons breaking Typst parsing
- Add explicit isFuncCall flag to BARE_DELIM_PAIRS instead of endsWith('(')
- Add 4 test cases: standalone ‖, complex ‖ with comma/number/variable
…ith ; separators Typst ignores \\ linebreaks inside mat() cells. When aligned/gathered environments are nested inside a matrix or cases cell, convert them to mat(delim: #none, ...) using ; row separators instead. - Add isInsideMatrixCell() recursive parent-chain walker - Wrap nested mat() in display() for block output to reset scriptlevel - Propagate typst_inline (without display()) through cell/row pipeline - Determine alignment from column usage: gathered→center, rl-pairs→right/left - Extract buildMatExpr helper to deduplicate block/inline mat construction
…gh lr() - Wrap cases() and plain matrices in display() when inside a mat() cell to prevent Typst scriptlevel reduction (block only, not inline) - Route eqnArrays with rowlines/columnlines through mat() format to preserve augment: #(hline/vline); add stroke: (dash: "dashed") when all separator lines are dashed - Propagate typst_inline through structural-handlers lr() path by building parallel contentInline and extracting buildLrExpr() helper - Extract computeAugment() and buildEqnArrayAsMat() helpers to deduplicate augment computation and eqnArray-as-mat construction - Detect eqnArray-with-lines parents in isInsideMatrixCell() - Cache isInsideMatrixCell() result to avoid redundant parent walks - Use separate needsSpaceBetweenNodes() calls for block/inline content
…lose Brackets inside these nodes are now paired independently from brackets outside, preventing false pairing when content is split across Typst function-call arguments (e.g. \sqrt( arg ) where ( and ) end up in different scopes). Each child of a scope-boundary node is processed as a separate pairing scope. SCOPE_BOUNDARIES set is module-level.
…mrows
- Detect \left.\aligned\right\} as cases(reverse: #true, ...) for
eqnArray-like tables (displaystyle rows); regular arrays keep matrix form
- Add hasTableFirst in structural-handlers: \left\{ table extra \right.
lets the table inherit { as cases(), extra content follows outside
- Add isFirstWithInvisibleClose in table-handlers so the table picks up
the open delimiter from the parent mrow when close is invisible
- Track contentInline in the hasTableChild/hasTableFirst mrow branch so
typst_inline propagates correctly when children return differing inline
- Add tests for reverse cases and cases() + stretch() patterns
- Digits before ( (.4() are no longer treated as function calls —
only ASCII letters qualify (isFuncCallParen)
- When a supposed function-call ( has no matching ), backtrack so the
for-loop re-scans the range and picks up any [, ], {, } inside
- Use non-whitespace check for spacing around symbol names (paren.l,
bracket.r, etc.) instead of \w — fixes missing space after quoted
strings ("л"paren.l) and other non-\w tokens
- Extend RE_WORD_CHAR, RE_WORD_DOT_END, RE_WORD_START with \p{L} for
Unicode letter support
- Move RE_ASCII_LETTER, RE_TRAILING_WS, RE_LEADING_WS to consts.ts
- Add tests: unpaired brackets across matrix rows with digits, letters,
real functions, and inner brackets inside failed function-call scans
escapeLrSemicolons now also escapes colons after identifiers (g: → g :), matching the behavior already present in escapeCasesSeparators and escapeContentSeparators. Without this, lr(g: K_0 ]) would be parsed by Typst as a named argument. Add tests for colon escaping in lr(), abs(), and general lr() paths.
MathJax splits \mathrm{टेक} into individual mi nodes per character,
breaking Devanagari/Arabic combining sequences. serializeCombiningMiChain
merges consecutive non-Latin mi nodes with the same mathvariant into a
single font-wrapped quoted string. Known math symbols (∂, ψ, ∅) are
excluded via typstSymbolMap lookup. Uses Unicode script properties
(\p{Script=Latin}) for robust Latin vs non-Latin classification.
… bases
Remove overline/underline from RE_SPECIAL_FN_CALL — they do not imply
below/above placement like overbrace/underbrace do. Add overbracket/
underbracket which were missing. Now \underset{...}{\underline{x}}
correctly produces limits(underline(x))_(...).
- escapeLrBrackets: escapes bare bracket chars matching the lr() delimiter type so Typst doesn't auto-scale inner brackets (e.g. \left[ [...] \right] → lr([ \[...\] ])). Only same-type brackets are escaped. - isSyntaxParen: renamed from isFuncCallParen, now also skips _() and ^() script grouping parens in scanBracketTokens. - Fix RE_SPECIAL_FN_CALL: remove overline/underline (they don't imply below/above placement), add overbracket/underbracket.
Wrap #box(stroke:...) and #circle(inset:...) with #align(center, ...)
for block display so they center like LaTeX \boxed and \enclose{circle}.
Inline variant remains unwrapped. Add integral.surf (\oiint), slash.o
(\oslash), lt.approx (\lessapprox), gt.approx (\gtrapprox) to symbol map.
Rewrite escapeUnbalancedParens to use scanBracketTokens + findUnpairedIndices instead of single-pass scanExpression — handles both unbalanced ( and ) (previously only )). Add mover/munder to SCOPE_BOUNDARIES so brackets inside accents don't pair with brackets outside. Remove dead escapeUnbalancedCloseParen option from scanExpression. Fixes \overline(x), \underline(x), \hat(x) producing unescaped parens.
Replace overline(")"content) / underline(")"content) with
overline(lr(\) content)) / underline(lr(\) content)) so the )
delimiter auto-scales via lr() instead of rendering at fixed size.
- \xcancel → cancel(cross: #true, ...) when both diagonal strikes present
- Script children (sub/sup) of msub/msup/msubsup are now separate scopes
in markUnpairedBrackets, while base stays in parent scope — fixes
\cancelto{5(y}x) where ( in script paired with ) outside
- safeFormatScript wrapper in script-handlers applies escapeUnbalancedParens
to ^(…)/_(…) content; removes escape-utils import from common.ts
\underset and \overset create munder/mover without accentunder/accent
attributes — they must use the general limits() path, not accent handlers.
Previously \underset{\rightarrow}{r} produced attach(r, b: arrow.r)
instead of limits(r)_(arrow.r).
MathJax builds \longrightleftharpoons and \longleftrightarrows from
mover with harpoon/arrow pieces. Detect these patterns via
CONSTRUCTED_LONG_ARROWS map and emit single Typst symbols
(harpoons.rtlb, arrows.lr).
Flatten mover(munder(...), over) via unwrapToScriptNode so
\stackrel{k_1}{\underset{k_2}{...}} produces limits(base)_(k_2)^(k_1)
instead of nested limits(limits(base)_(k_2))^(k_1).
… notation matching
- menclose with border-side notation (left/right/top/bottom combos from
\begin{array}{|l|}\hline) now generates #box(stroke: (...)) with per-side
strokes instead of overline()/underline()
- Cap vline augment indices at actual column count to prevent out-of-bounds
when column spec has more columns than data cells
- Refactor menclose notation checks from String.includes() to Set-based
word-boundary safe matching via parseNotation()/hasNotation()
…airs serializeRange used needsTokenSeparator which lacks the script+bracket spacing check. Switched to needsSpaceBetweenNodes so that e.g. \|L_N^n(\Delta S)\|_\infty produces norm(L_N^n (Delta S)) with a space before ( to prevent Typst from parsing n( as a function call.
…ntent Narrow gathered-like detection so that gathered directly inside align* (sole cell content) keeps \\-separated rows, while gathered with siblings in an aligned cell becomes display(mat(delim: #none, ...)). Also propagate typst_inline through eqnArray row-building so that display() never leaks into the inline variant.
Add documentation for \not negation, \xcancel, constructed long arrow collapsing, accent attribute gating, longdiv lr(\) stretchy delimiter, selective border strokes, vline capping, nested eqnArray/gathered detection, reverse cases, non-Latin script grouping, function-call-aware bare delimiter scanning, colon/bracket escaping in lr(), scope boundaries in bracket pairing, new symbol mappings, and display()/typst_inline propagation through eqnArray rows.
Central registry (custom-cmd-map.ts) for LaTeX commands that expand into visual-hack subtrees but need clean symbol output. Custom handler in my-BaseMappings stamps data-custom-cmd on the MathML node at parse time; MathML visitors, ASCII serializer, and Typst serializer all look up the symbol from the shared map. Typed MmlVisitorProto interface replaces proto: any in patchVisitorTeXAtom. PR spec updated.
- Symbol map: integral.vol, integral.cont.cw/ccw, angstrom, tack.t.double - Custom-cmd: llbracket/rrbracket via makeCustomCmdHandler factory - Delimiter-pairing guard: getBigDelimInfo/resolveDelimiterMo skip data-custom-cmd nodes to prevent false [/] pairing - PR spec updated with all changes
setProperty suffices for all visitor/serializer lookups and does not leak data-custom-cmd into SVG or MathML attribute output. PR spec updated.
- Add binary operators: uplus, dotplus, Cap, Cup, leftthreetimes, rightthreetimes, boxminus, circleddash, circledast, boxdot, circledcirc, boxplus, divideontimes - Add relations: approxeq, lessgtr, gtrless, lesseqgtr, gtreqless, Doteq, risingdotseq, fallingdotseq, backsim, Subset, Supset, curlyeqprec, curlyeqsucc, precapprox, succapprox, Vdash, eqsim, lneq, gneq, lneqq, gneqq, lnsim, gnsim, lnapprox, gnapprox, precneqq, succneqq, precnsim, succnsim, precnapprox, succnapprox, nparallel, nvdash, nvDash, nVdash, ntrianglelefteq, ntrianglerighteq - Add arrows: Lleftarrow, Rrightarrow, leftarrowtail, looparrowleft, looparrowright, curvearrowleft, curvearrowright, upuparrows, downdownarrows, leftrightsquigarrow, nleftrightarrow, nLeftrightarrow - Add misc: blacktriangle, blacktriangledown, bigstar, checkmark, maltese, blacktriangleleft, blacktriangleright - Fix U+21C4/U+21C6 swap: rightleftarrows→arrows.rl, leftrightarrows→arrows.lr - Fix STRETCH_BASE_SYMBOLS: arrows.rr→arrows.rl for \xtofrom
- \backprime → prime.rev (U+2035) - \backsimeq → tilde.eq.rev (U+22CD) - \Join / \bowtie (U+22C8) → join (was raw ⋈)
- Distinguish \atop (bare mfrac, no fence) from \binom (fenced mfrac) by checking parent mrow for OPEN/CLOSE texClass - Add backprime → prime.rev, backsimeq → tilde.eq.rev, Join → join - Update PR spec: \atop conversion, batch 2 symbols, ~370 symbol count
MathJax merges adjacent mo nodes without operands into a single text
node (e.g. \approx \approxeq → "≈≊"). findTypstSymbol now maps each
character separately when the string is all non-ASCII, preserving
ASCII operator names ("lim", "sin") intact.
Symbols (~30): omicron, kappa.alt, digamma, gimel, daleth, beth, backepsilon, bigcirc, blacklozenge, yen, section, circledR, lgroup/rgroup, lmoustache/rmoustache, npreceq, nsucceq, ntriangleleft, ntriangleright, circlearrowleft/right, dashleftarrow/right, overparen accent Fixes: - findTypstSymbol: use ALL_ALPHA guard instead of ALL_NON_ASCII so merged mo strings with ASCII operators (e.g. >) are split correctly - stripDashes: extract regexes to HIDE_PATTERN/DASH_CHARS constants, strip #hide(...) so \longLeftrightharpoons collapses to harpoons.rtlb - Update PR spec with batch 3, merged mo fix, ~400 symbol count
mstyle handler detects tripledash pattern (mathsize<1em + mtext dashes) and replaces "-" with hyph. mpadded handler suppresses #hide() output when width=0 phantom is adjacent to a tripledash mstyle sibling. Removed unused tripledash entry from custom-cmd-map.
…ymbol) \underleftarrow, \underrightarrow, \underleftrightarrow now produce limits(...)_arrow.l etc. without extra parens around the symbol.
- Sync TYPST_MATH_OPERATORS / TYPST_BUILTIN_OPS, remove non-built-in ops - Extract TEX_ATOM and MLABELEDTR constants into consts.ts - Rename SCRIPT_KINDS → SUBSCRIPT_KINDS for clarity - Extract escapeTypstString into common.ts, remove duplicate - Add needsTokenSeparator in handleAll for regular children - Add findTypstSymbol guard in big-delimiter pattern - Add console.warn in getBigDelimInfo/resolveDelimiterMo catch blocks - Fix duplicate "Pattern 6" → "Pattern 7" comment - Fix close variable shadowing in mrow handler (→ closeMapped) - scanExpression: string concat → parts.push() + join() - Extract escapeAtPositions helper, use in 3 escape functions - findUnpairedIndices: single-pass instead of two passes - Add JSDoc to ANCESTOR_MAX_DEPTH and SHALLOW_TREE_MAX_DEPTH - Add /pr-specs to .npmignore - Add edge-case tests (empty, invalid, deep nesting, separators, etc.) - Update PR spec
mfrac handler now checks the actual delimiter on the parent mrow
via unwrapToMoText (TeXAtom > inferredMrow > mo). Only ) produces
binom(); { and [ produce mat(delim: "{"/"]", ...; ...).
…e tests - Return empty typst with error field for merror nodes instead of serializing error text into typstmath/typstmath_inline (findMerror in toTypstData) - Clean up tree mutations after serialization using removeProperty() instead of setProperty(undefined) or any-cast delete - Deduplicate TYPST_MATH_OPERATORS: single source in consts.ts, imported by token-handlers.ts; TYPST_BUILTIN_OPS built as union with TYPST_MATH_FUNCTIONS - Remove unused optionTypst parameter and type toTypstData with MathNode - Rename SUBSCRIPT_KINDS → IDOTSINT_SCRIPT_KINDS for clarity - Make RE_TAG_EXTRACT_G local to extractTagFromConditionCell (no lastIndex leak) - Remove console.warn from catch blocks (silent error handling for library) - Extract BOX_STROKE/BOX_INSET constants for boxed/circle/border styling - Remove extra parens wrapping toTypstData arrow function - Add edge-case tests: 15-level frac, unbalanced delimiters, unknown command, escape injection; error tests verify error field presence - Update PR spec with error handling section
…ypstData - README: add typst to format list, HTML output tags, JSON output examples, include_typst in all options blocks, conversion examples with real object format - Changelog: add [2.0.39] entry for Typst math format - index.ts: extract ITypstConvertResult interface, type TexConvertToTypstData
Underscores are invalid in HTML tag names. Renamed to hyphenated form in tag generation, parsing, and documentation.
- Add include_typst: true to node-examples (math, tabular, tabular_include/not_include_sub_math) - Add typst checkbox and option to react-app form.jsx - Fix react-app webpack 5 build: add react-app-rewired + path-browserify polyfill for postcss's path dependency
- Add MMD_TYPES const (abstract, theorem, proof, align) to consts.ts - Tag tokens with token.meta.mmd_type in begin-align, block-rule, mdPluginText - Export MMD_TYPES from index.tsx for external consumers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Spec: pr-specs/2026-02-add-typst-math-format.md
Summary
MathJax.TexConvertToTypstData(latex)API, returning{ typstmath, typstmath_inline, error? }include_typst: trueoption inoutMath— adds<typstmath>and<typstmath-inline>HTML tags alongside existing MathML/AsciiMath formatsKey design decisions
typstmath(block — may include code-mode wrappers like#box(),#align()) andtypstmath_inline(pure math-mode, safe for inline$...$)merrornodes (invalid LaTeX) produce{ typstmath: '', typstmath_inline: '', error: '...' }— errors never leak into Typst outputcustom-cmd-map.ts): handles commands like\Varangle,\llbracket,\poundsthat need special Typst dispatchDocs & examples
[2.0.39]include_typstadded to node-examples and react-app demo with typst checkboxTest plan
tests/_data/_typst/data.jscovering all supported constructs - [x] Dedicated tests for escape utils (tests/_escape-utils.js) and custom command map (tests/_custom_cmd.js)