Skip to content

[v2.0.39] Add native Typst math output format (LaTeX/MathML → Typst)#405

Draft
OlgaRedozubova wants to merge 51 commits intomasterfrom
dev/olga/Add-typst-fotmat-for-math-v1
Draft

[v2.0.39] Add native Typst math output format (LaTeX/MathML → Typst)#405
OlgaRedozubova wants to merge 51 commits intomasterfrom
dev/olga/Add-typst-fotmat-for-math-v1

Conversation

@OlgaRedozubova
Copy link
Contributor

@OlgaRedozubova OlgaRedozubova commented Mar 10, 2026

Spec: pr-specs/2026-02-add-typst-math-format.md

Summary

  • Add LaTeX-to-Typst math conversion via new MathJax.TexConvertToTypstData(latex) API, returning { typstmath, typstmath_inline, error? }
  • Integrate Typst output into the rendering pipeline with include_typst: true option in outMath — adds <typstmath> and <typstmath-inline> HTML tags alongside existing MathML/AsciiMath formats
  • Implement a full Typst serializer covering: fractions, roots, matrices, cases, delimiters (floor/ceil/norm/abs), blackboard bold shorthands, custom operators, accents, cancellations, colors, \boxed, tags/labels, aligned/gathered environments, and 500+ symbol mappings

Key design decisions

  • Two output modes: typstmath (block — may include code-mode wrappers like #box(), #align()) and typstmath_inline (pure math-mode, safe for inline $...$)
  • Modular serializer architecture: split into token-handlers, script-handlers, structural-handlers, table-handlers, bracket-utils, escape-utils, consts, and typst-symbol-map
  • Error propagation: merror nodes (invalid LaTeX) produce { typstmath: '', typstmath_inline: '', error: '...' } — errors never leak into Typst output
  • Custom command map (custom-cmd-map.ts): handles commands like \Varangle, \llbracket, \pounds that need special Typst dispatch

Docs & examples

  • README: Typst conversion examples, format list, HTML/JSON output docs
  • Changelog: [2.0.39]
  • Examples: include_typst added to node-examples and react-app demo with typst checkbox

Test plan

  • 4200+ test cases in tests/_data/_typst/data.js covering all supported constructs - [x] Dedicated tests for escape utils (tests/_escape-utils.js) and custom command map (tests/_custom_cmd.js)
  • npm test passes

@OlgaRedozubova OlgaRedozubova self-assigned this Mar 10, 2026
@OlgaRedozubova OlgaRedozubova force-pushed the dev/olga/Add-typst-fotmat-for-math-v1 branch 2 times, most recently from 39b65d8 to d81d7f4 Compare March 11, 2026 13:52
Implement SerializedTypstVisitor that converts MathJax's internal MathML
tree into native Typst math syntax, enabling direct LaTeX → Typst and
MathML → Typst conversion.

Key features:
- Full token handling: mi, mo, mn, mtext, mspace with font variants,
  operator detection, and context-aware spacing
- Script/limit constructs: msub, msup, msubsup, munder, mover,
  munderover, mmultiscripts with movablelimits-aware placement
- Structural elements: mfrac, msqrt, mroot, mtable (matrix/cases/
  equation arrays with alignment), mfenced, menclose, mphantom
- Delimiter handling: paired/unpaired bracket detection via pre-serialization
  tree walk, lr() wrapping, abs/norm/floor/ceil shorthand with separator
  fallback to lr()
- Equation numbering: auto-numbered and \tag{} equations, numcases/
  subnumcases grid layout with per-row counters and labels
- Symbol mapping: 500+ Unicode → Typst symbol mappings including arrows,
  relations, accents, Greek, operators, geometry, and suits
- Escape handling: unified scanner for comma/semicolon/colon escaping
  in function calls, string literal skipping, bracket depth tracking
- Dual output: block (typst) and inline (typst_inline) variants
- Context menu integration for copying Typst math output

Architecture:
- Modular handler files: token-handlers, script-handlers,
  structural-handlers, table-handlers
- Shared utilities: common.ts, consts.ts, types.ts, escape-utils.ts,
  bracket-utils.ts, typst-symbol-map.ts
- Strict TypeScript typing throughout, no any casts
Extract serializeThousandSepChain into common.ts, replacing duplicated
chain logic in index.ts and structural-handlers.ts. Add tree mutation
order comment, forLatex/include_typst comments, and remove test logging.
…content spacing

- isDerivativePattern now checks for actual prime chars (′ ″ ‴) instead of any mo
- Move SCRIPT_NODE_KINDS and PRIME_CHARS to consts.ts, remove duplicate SCRIPT_PARENT_KINDS
- Revert post-content loop to needsTokenSeparator (fixes tau_(i,j)(t) regression)
- Add comment about prevNode in \left...\right delimiter handling
- Add test for f^{(n)}(a) → f^((n))(a) (TeXAtom derivative, no space)
- \left.\right\} now produces lr(mat(delim: #none, ...) \}) instead of losing the brace
- Mismatched pairs like \left[\right) use lr() wrapping instead of mat(delim:)
- Matched pairs (same char or standard open→close) still use compact mat(delim: ...)
- Add tests for \left.\right\} in align* and mismatched \left[\right) on array
Two MathJax patterns for \not:
- Overlay: TeXAtom(REL) > mpadded[width=0] > mtext(⧸) — detected in
  mrow/inferredMrow loops, next sibling wrapped in cancel()
- Combining char: U+0338 appended to mi/mo text — stripped and wrapped
  in cancel() in token handlers
- Fix cancel() loss on early returns in mo handler (multiword/namedOp)
- Add tests for \not 7,60 and \not k + \not q
@OlgaRedozubova OlgaRedozubova force-pushed the dev/olga/Add-typst-fotmat-for-math-v1 branch from d81d7f4 to f673fb3 Compare March 13, 2026 11:31
…t-handling API

- Add two-pass escapeUnpairedBrackets (reuses scanBracketTokens + findUnpairedIndices)
- Integrate into escapeContentSeparators for all function-call arguments
- Integrate replaceUnpairedBrackets into escapeCasesSeparators for consistent API
- Remove manual replaceUnpairedBrackets calls from table-handlers
- Add unit tests for escape-utils
… lr()

The mrow handler incorrectly delegated to mtable when ANY child was a
table, even when other content (arrows, operators) sat alongside it.
Now hasTableChild is true only when mtable is the sole content child.
The mtable handler also checks parent delimiters only for sole-content
case, preventing double lr() wrapping.

Extract getContentChildren into common.ts and containsTable as a
standalone helper to eliminate duplication between the two handlers.
…call parsing

In Typst math mode, identifier( is parsed as a function call (see
typst/typst#7274). Insert a space before ( when the preceding token
is a multi-char name not in TYPST_BUILTIN_OPS (e.g. emptyset, sigma,
Gamma, psi). Single-char identifiers (f, g) and built-in operators
(sin, cos, ln, arg) keep no space.
…rsing

Add escapeColon to escapeContentSeparators so word: inside any Typst
function call becomes word : (space prevents named-arg syntax).
Apply escapeContentSeparators to abs(), norm(), floor(), ceil()
content which previously had no escaping.
…content

- Extract resolveDelimiterMo helper to access texClass on delimiter nodes
- Reject ‖ pairing when opener has CLOSE texClass (surrounding pair context)
- Reject ‖ pairing when content contains PUNCT (comma between standalone ‖)
- Reject ‖ pairing spanning entire row when content has REL operator (=)
- Apply escapeContentSeparators to bare delimiter func-call content (norm,
  floor, ceil) to prevent commas/semicolons/colons breaking Typst parsing
- Add explicit isFuncCall flag to BARE_DELIM_PAIRS instead of endsWith('(')
- Add 4 test cases: standalone ‖, complex ‖ with comma/number/variable
…ith ; separators

Typst ignores \\ linebreaks inside mat() cells. When aligned/gathered
environments are nested inside a matrix or cases cell, convert them to
mat(delim: #none, ...) using ; row separators instead.

- Add isInsideMatrixCell() recursive parent-chain walker
- Wrap nested mat() in display() for block output to reset scriptlevel
- Propagate typst_inline (without display()) through cell/row pipeline
- Determine alignment from column usage: gathered→center, rl-pairs→right/left
- Extract buildMatExpr helper to deduplicate block/inline mat construction
…gh lr()

- Wrap cases() and plain matrices in display() when inside a mat() cell
  to prevent Typst scriptlevel reduction (block only, not inline)
- Route eqnArrays with rowlines/columnlines through mat() format to
  preserve augment: #(hline/vline); add stroke: (dash: "dashed") when
  all separator lines are dashed
- Propagate typst_inline through structural-handlers lr() path by
  building parallel contentInline and extracting buildLrExpr() helper
- Extract computeAugment() and buildEqnArrayAsMat() helpers to
  deduplicate augment computation and eqnArray-as-mat construction
- Detect eqnArray-with-lines parents in isInsideMatrixCell()
- Cache isInsideMatrixCell() result to avoid redundant parent walks
- Use separate needsSpaceBetweenNodes() calls for block/inline content
…lose

Brackets inside these nodes are now paired independently from brackets
outside, preventing false pairing when content is split across Typst
function-call arguments (e.g. \sqrt( arg ) where ( and ) end up in
different scopes). Each child of a scope-boundary node is processed
as a separate pairing scope. SCOPE_BOUNDARIES set is module-level.
…mrows

- Detect \left.\aligned\right\} as cases(reverse: #true, ...) for
  eqnArray-like tables (displaystyle rows); regular arrays keep matrix form
- Add hasTableFirst in structural-handlers: \left\{ table extra \right.
  lets the table inherit { as cases(), extra content follows outside
- Add isFirstWithInvisibleClose in table-handlers so the table picks up
  the open delimiter from the parent mrow when close is invisible
- Track contentInline in the hasTableChild/hasTableFirst mrow branch so
  typst_inline propagates correctly when children return differing inline
- Add tests for reverse cases and cases() + stretch() patterns
- Digits before ( (.4() are no longer treated as function calls —
  only ASCII letters qualify (isFuncCallParen)
- When a supposed function-call ( has no matching ), backtrack so the
  for-loop re-scans the range and picks up any [, ], {, } inside
- Use non-whitespace check for spacing around symbol names (paren.l,
  bracket.r, etc.) instead of \w — fixes missing space after quoted
  strings ("л"paren.l) and other non-\w tokens
- Extend RE_WORD_CHAR, RE_WORD_DOT_END, RE_WORD_START with \p{L} for
  Unicode letter support
- Move RE_ASCII_LETTER, RE_TRAILING_WS, RE_LEADING_WS to consts.ts
- Add tests: unpaired brackets across matrix rows with digits, letters,
  real functions, and inner brackets inside failed function-call scans
escapeLrSemicolons now also escapes colons after identifiers (g: → g :),
matching the behavior already present in escapeCasesSeparators and
escapeContentSeparators. Without this, lr(g: K_0 ]) would be parsed
by Typst as a named argument.

Add tests for colon escaping in lr(), abs(), and general lr() paths.
MathJax splits \mathrm{टेक} into individual mi nodes per character,
breaking Devanagari/Arabic combining sequences. serializeCombiningMiChain
merges consecutive non-Latin mi nodes with the same mathvariant into a
single font-wrapped quoted string. Known math symbols (∂, ψ, ∅) are
excluded via typstSymbolMap lookup. Uses Unicode script properties
(\p{Script=Latin}) for robust Latin vs non-Latin classification.
… bases

Remove overline/underline from RE_SPECIAL_FN_CALL — they do not imply
below/above placement like overbrace/underbrace do. Add overbracket/
underbracket which were missing. Now \underset{...}{\underline{x}}
correctly produces limits(underline(x))_(...).
- escapeLrBrackets: escapes bare bracket chars matching the lr() delimiter
  type so Typst doesn't auto-scale inner brackets (e.g. \left[ [...] \right]
  → lr([ \[...\] ])). Only same-type brackets are escaped.
- isSyntaxParen: renamed from isFuncCallParen, now also skips _() and ^()
  script grouping parens in scanBracketTokens.
- Fix RE_SPECIAL_FN_CALL: remove overline/underline (they don't imply
  below/above placement), add overbracket/underbracket.
Wrap #box(stroke:...) and #circle(inset:...) with #align(center, ...)
for block display so they center like LaTeX \boxed and \enclose{circle}.
Inline variant remains unwrapped. Add integral.surf (\oiint), slash.o
(\oslash), lt.approx (\lessapprox), gt.approx (\gtrapprox) to symbol map.
Rewrite escapeUnbalancedParens to use scanBracketTokens + findUnpairedIndices
instead of single-pass scanExpression — handles both unbalanced ( and )
(previously only )). Add mover/munder to SCOPE_BOUNDARIES so brackets
inside accents don't pair with brackets outside. Remove dead
escapeUnbalancedCloseParen option from scanExpression.

Fixes \overline(x), \underline(x), \hat(x) producing unescaped parens.
Replace overline(")"content) / underline(")"content) with
overline(lr(\) content)) / underline(lr(\) content)) so the )
delimiter auto-scales via lr() instead of rendering at fixed size.
- \xcancel → cancel(cross: #true, ...) when both diagonal strikes present
- Script children (sub/sup) of msub/msup/msubsup are now separate scopes
  in markUnpairedBrackets, while base stays in parent scope — fixes
  \cancelto{5(y}x) where ( in script paired with ) outside
- safeFormatScript wrapper in script-handlers applies escapeUnbalancedParens
  to ^(…)/_(…) content; removes escape-utils import from common.ts
\underset and \overset create munder/mover without accentunder/accent
attributes — they must use the general limits() path, not accent handlers.
Previously \underset{\rightarrow}{r} produced attach(r, b: arrow.r)
instead of limits(r)_(arrow.r).
MathJax builds \longrightleftharpoons and \longleftrightarrows from
mover with harpoon/arrow pieces. Detect these patterns via
CONSTRUCTED_LONG_ARROWS map and emit single Typst symbols
(harpoons.rtlb, arrows.lr).

Flatten mover(munder(...), over) via unwrapToScriptNode so
\stackrel{k_1}{\underset{k_2}{...}} produces limits(base)_(k_2)^(k_1)
instead of nested limits(limits(base)_(k_2))^(k_1).
… notation matching

- menclose with border-side notation (left/right/top/bottom combos from
  \begin{array}{|l|}\hline) now generates #box(stroke: (...)) with per-side
  strokes instead of overline()/underline()
- Cap vline augment indices at actual column count to prevent out-of-bounds
  when column spec has more columns than data cells
- Refactor menclose notation checks from String.includes() to Set-based
  word-boundary safe matching via parseNotation()/hasNotation()
…airs

serializeRange used needsTokenSeparator which lacks the script+bracket
spacing check. Switched to needsSpaceBetweenNodes so that e.g.
\|L_N^n(\Delta S)\|_\infty produces norm(L_N^n (Delta S)) with a space
before ( to prevent Typst from parsing n( as a function call.
…ntent

Narrow gathered-like detection so that gathered directly inside align*
(sole cell content) keeps \\-separated rows, while gathered with siblings
in an aligned cell becomes display(mat(delim: #none, ...)).

Also propagate typst_inline through eqnArray row-building so that
display() never leaks into the inline variant.
Add documentation for \not negation, \xcancel, constructed long arrow
collapsing, accent attribute gating, longdiv lr(\) stretchy delimiter,
selective border strokes, vline capping, nested eqnArray/gathered
detection, reverse cases, non-Latin script grouping, function-call-aware
bare delimiter scanning, colon/bracket escaping in lr(), scope boundaries
in bracket pairing, new symbol mappings, and display()/typst_inline
propagation through eqnArray rows.
Central registry (custom-cmd-map.ts) for LaTeX commands that expand
into visual-hack subtrees but need clean symbol output.  Custom handler
in my-BaseMappings stamps data-custom-cmd on the MathML node at parse
time; MathML visitors, ASCII serializer, and Typst serializer all look
up the symbol from the shared map.  Typed MmlVisitorProto interface
replaces proto: any in patchVisitorTeXAtom.  PR spec updated.
- Symbol map: integral.vol, integral.cont.cw/ccw, angstrom, tack.t.double
- Custom-cmd: llbracket/rrbracket via makeCustomCmdHandler factory
- Delimiter-pairing guard: getBigDelimInfo/resolveDelimiterMo skip
  data-custom-cmd nodes to prevent false [/] pairing
- PR spec updated with all changes
setProperty suffices for all visitor/serializer lookups and does not
leak data-custom-cmd into SVG or MathML attribute output.  PR spec
updated.
- Add binary operators: uplus, dotplus, Cap, Cup, leftthreetimes,
  rightthreetimes, boxminus, circleddash, circledast, boxdot,
  circledcirc, boxplus, divideontimes
- Add relations: approxeq, lessgtr, gtrless, lesseqgtr, gtreqless,
  Doteq, risingdotseq, fallingdotseq, backsim, Subset, Supset,
  curlyeqprec, curlyeqsucc, precapprox, succapprox, Vdash, eqsim,
  lneq, gneq, lneqq, gneqq, lnsim, gnsim, lnapprox, gnapprox,
  precneqq, succneqq, precnsim, succnsim, precnapprox, succnapprox,
  nparallel, nvdash, nvDash, nVdash, ntrianglelefteq, ntrianglerighteq
- Add arrows: Lleftarrow, Rrightarrow, leftarrowtail, looparrowleft,
  looparrowright, curvearrowleft, curvearrowright, upuparrows,
  downdownarrows, leftrightsquigarrow, nleftrightarrow, nLeftrightarrow
- Add misc: blacktriangle, blacktriangledown, bigstar, checkmark,
  maltese, blacktriangleleft, blacktriangleright
- Fix U+21C4/U+21C6 swap: rightleftarrows→arrows.rl, leftrightarrows→arrows.lr
- Fix STRETCH_BASE_SYMBOLS: arrows.rr→arrows.rl for \xtofrom
- \backprime → prime.rev (U+2035)
- \backsimeq → tilde.eq.rev (U+22CD)
- \Join / \bowtie (U+22C8) → join (was raw ⋈)
- Distinguish \atop (bare mfrac, no fence) from \binom (fenced mfrac)
  by checking parent mrow for OPEN/CLOSE texClass
- Add backprime → prime.rev, backsimeq → tilde.eq.rev, Join → join
- Update PR spec: \atop conversion, batch 2 symbols, ~370 symbol count
MathJax merges adjacent mo nodes without operands into a single text
node (e.g. \approx \approxeq → "≈≊"). findTypstSymbol now maps each
character separately when the string is all non-ASCII, preserving
ASCII operator names ("lim", "sin") intact.
Symbols (~30): omicron, kappa.alt, digamma, gimel, daleth, beth,
  backepsilon, bigcirc, blacklozenge, yen, section, circledR,
  lgroup/rgroup, lmoustache/rmoustache, npreceq, nsucceq,
  ntriangleleft, ntriangleright, circlearrowleft/right,
  dashleftarrow/right, overparen accent

Fixes:
- findTypstSymbol: use ALL_ALPHA guard instead of ALL_NON_ASCII so
  merged mo strings with ASCII operators (e.g. >) are split correctly
- stripDashes: extract regexes to HIDE_PATTERN/DASH_CHARS constants,
  strip #hide(...) so \longLeftrightharpoons collapses to harpoons.rtlb
- Update PR spec with batch 3, merged mo fix, ~400 symbol count
mstyle handler detects tripledash pattern (mathsize<1em + mtext dashes)
and replaces "-" with hyph. mpadded handler suppresses #hide() output
when width=0 phantom is adjacent to a tripledash mstyle sibling.
Removed unused tripledash entry from custom-cmd-map.
…ymbol)

\underleftarrow, \underrightarrow, \underleftrightarrow now produce
limits(...)_arrow.l etc. without extra parens around the symbol.
- Sync TYPST_MATH_OPERATORS / TYPST_BUILTIN_OPS, remove non-built-in ops
- Extract TEX_ATOM and MLABELEDTR constants into consts.ts
- Rename SCRIPT_KINDS → SUBSCRIPT_KINDS for clarity
- Extract escapeTypstString into common.ts, remove duplicate
- Add needsTokenSeparator in handleAll for regular children
- Add findTypstSymbol guard in big-delimiter pattern
- Add console.warn in getBigDelimInfo/resolveDelimiterMo catch blocks
- Fix duplicate "Pattern 6" → "Pattern 7" comment
- Fix close variable shadowing in mrow handler (→ closeMapped)
- scanExpression: string concat → parts.push() + join()
- Extract escapeAtPositions helper, use in 3 escape functions
- findUnpairedIndices: single-pass instead of two passes
- Add JSDoc to ANCESTOR_MAX_DEPTH and SHALLOW_TREE_MAX_DEPTH
- Add /pr-specs to .npmignore
- Add edge-case tests (empty, invalid, deep nesting, separators, etc.)
- Update PR spec
mfrac handler now checks the actual delimiter on the parent mrow
via unwrapToMoText (TeXAtom > inferredMrow > mo). Only ) produces
binom(); { and [ produce mat(delim: "{"/"]", ...; ...).
…e tests

- Return empty typst with error field for merror nodes instead of serializing
  error text into typstmath/typstmath_inline (findMerror in toTypstData)
- Clean up tree mutations after serialization using removeProperty() instead
  of setProperty(undefined) or any-cast delete
- Deduplicate TYPST_MATH_OPERATORS: single source in consts.ts, imported by
  token-handlers.ts; TYPST_BUILTIN_OPS built as union with TYPST_MATH_FUNCTIONS
- Remove unused optionTypst parameter and type toTypstData with MathNode
- Rename SUBSCRIPT_KINDS → IDOTSINT_SCRIPT_KINDS for clarity
- Make RE_TAG_EXTRACT_G local to extractTagFromConditionCell (no lastIndex leak)
- Remove console.warn from catch blocks (silent error handling for library)
- Extract BOX_STROKE/BOX_INSET constants for boxed/circle/border styling
- Remove extra parens wrapping toTypstData arrow function
- Add edge-case tests: 15-level frac, unbalanced delimiters, unknown command,
  escape injection; error tests verify error field presence
- Update PR spec with error handling section
…ypstData

- README: add typst to format list, HTML output tags, JSON output examples,
  include_typst in all options blocks, conversion examples with real object format
- Changelog: add [2.0.39] entry for Typst math format
- index.ts: extract ITypstConvertResult interface, type TexConvertToTypstData
Underscores are invalid in HTML tag names. Renamed to hyphenated form
in tag generation, parsing, and documentation.
- Add include_typst: true to node-examples (math, tabular, tabular_include/not_include_sub_math)
- Add typst checkbox and option to react-app form.jsx
- Fix react-app webpack 5 build: add react-app-rewired + path-browserify
  polyfill for postcss's path dependency
@OlgaRedozubova OlgaRedozubova changed the title PR into master from dev/olga/Add-typst-fotmat-for-math-v1 [v2.0.39] Add native Typst math output format (LaTeX/MathML → Typst) Mar 19, 2026
- Add MMD_TYPES const (abstract, theorem, proof, align) to consts.ts
- Tag tokens with token.meta.mmd_type in begin-align, block-rule, mdPluginText
- Export MMD_TYPES from index.tsx for external consumers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant