中文 | English
md4go is a Markdown parser for Go that uses a push-based, event-driven model and does not build an AST. It is CommonMark 0.31 compliant (652/652) with full support for GFM extensions (tables / strikethrough / task lists / autolinks).
go get github.com/userpro/md4gopackage main
import (
"os"
"md4go/text"
"md4go/parser"
)
func main() {
src := []byte("# Hello\n\n- item1\n- item2\n")
text.Convert(src, os.Stdout, text.WithFlags(parser.DialectGitHub))
}
// Output:
// Hello
//
// item1
// item2package main
import (
"os"
"md4go/html"
"md4go/parser"
)
func main() {
src := []byte("# Hello\n\n- item1\n- item2\n")
html.Convert(src, os.Stdout, html.WithFlags(parser.DialectGitHub))
}
// Output:
// <h1>Hello</h1>
// <ul>
// <li>item1</li>
// <li>item2</li>
// </ul># Build
go build -o md4go ./cmd/md4go
# Markdown → plain text (GFM mode by default)
echo "# Hello" | ./md4go
# Markdown → HTML
echo "# Hello" | ./md4go -html
# Streaming input (low memory, plain text mode only)
cat large.md | ./md4go -stream
# goldmark compatibility mode
echo "| a | b |" | ./md4go -compat goldmark┌─────────────────────────────────────────────┐
│ Convenience Layer (one-line wrappers) │
│ text.Convert() html.Convert() │
├─────────────────────────────────────────────┤
│ Core Layer (parsing API) │
│ md4go.Parser.Parse(src, renderer) │
│ renderer.Renderer interface │
├─────────────────────────────────────────────┤
│ Custom Layer (user-defined renderers) │
│ Implement the 5 methods of Renderer │
└─────────────────────────────────────────────┘
- Convenience Layer (
text/htmlpackages): convert in a single call - Core Layer (
md4goroot package): full parsing API, pushes events to any Renderer - Custom Layer: implement the
renderer.Rendererinterface for custom output formats
// Extract plain text from Markdown, stripping all formatting markers
var buf bytes.Buffer
text.Convert(markdownBytes, &buf, text.WithFlags(parser.DialectGitHub))
plainText := buf.String()Typical uses:
- Document preprocessing for RAG systems
- Content indexing for full-text search engines
- Plain-text versions of Markdown emails / notifications
- Text summaries of chat messages
// Generate HTML in XHTML mode
var buf bytes.Buffer
html.Convert(markdownBytes, &buf,
html.WithFlags(parser.DialectGitHub),
html.WithRendererFlags(html.FlagXHTML),
)// Read line by line with constant memory usage
file, _ := os.Open("large.md")
defer file.Close()
text.ConvertStream(file, os.Stdout, text.WithFlags(parser.DialectGitHub))Note: In streaming mode, reference link definitions (refdefs) follow a "first-seen-first-served" rule — forward references degrade to literal text. One-shot parsing (
Convert) has no such limitation.
md4go compiles to WebAssembly for browser-side Markdown parsing. See wasm/README.md for details.
<script type="module">
import { initMd4go } from './wasm/md4go.js';
const { parseToHTML, parseToText } = await initMd4go();
console.log(parseToHTML("# Hello **world**"));
</script># Build
GOOS=js GOARCH=wasm go build -o md4go.wasm ./wasm
cp "$(go env GOROOT)/lib/wasm/wasm_exec.js" .// Extract all links
type LinkExtractor struct {
links []string
inLink bool
}
func (e *LinkExtractor) EnterBlock(ast.BlockType, any) error { return nil }
func (e *LinkExtractor) LeaveBlock(ast.BlockType, any) error { return nil }
func (e *LinkExtractor) EnterSpan(s ast.SpanType, d any) error {
if s == ast.SpanLink {
if detail, ok := d.(*ast.LinkDetail); ok {
e.links = append(e.links, string(detail.Href.Text))
}
e.inLink = true
}
return nil
}
func (e *LinkExtractor) LeaveSpan(ast.SpanType, any) error { return nil }
func (e *LinkExtractor) Text(ast.TextType, []byte) error { return nil }
// Usage
p := md4go.New(md4go.WithFlags(parser.DialectGitHub))
ext := &LinkExtractor{}
p.Parse(src, ext)
fmt.Println(ext.links) // ["https://example.com", ...]// Create a parser
p := md4go.New(
md4go.WithFlags(parser.DialectGitHub), // set parse flags
md4go.WithExtensions(&extension.Table{}), // register extensions
)
// Parse []byte → push events to renderer
p.Parse(src, myRenderer)
// Stream-parse io.Reader → push events to renderer
p.ParseStream(lineSource, myRenderer)// One-shot conversion
text.Convert(src, writer, text.WithFlags(...), text.WithExtensions(...))
// Streaming conversion
text.ConvertStream(reader, writer, text.WithFlags(...))
// Get a renderer instance (advanced)
pt := text.NewPlainText(writer)
p.Parse(src, pt)
pt.Flush()// One-shot conversion
html.Convert(src, writer, html.WithFlags(...), html.WithExtensions(...), html.WithRendererFlags(...))
// XHTML mode (default)
h := html.NewHTML(writer)
// Specify renderer flags
h := html.NewWithFlags(writer, html.FlagXHTML|html.FlagVerbatimEntities)
// Advanced usage
h := html.NewHTMLWithWriter(renderer.NewBufWriter(writer))HTML renderer flags:
| Flag | Value | Description |
|---|---|---|
FlagDebug |
0x0001 | Debug output |
FlagVerbatimEntities |
0x0002 | Output entities verbatim (not translated to UTF-8) |
FlagSkipUTF8BOM |
0x0004 | Skip a leading UTF-8 BOM in the input |
FlagXHTML |
0x0008 | XHTML self-closing tags (<br />) |
FlagNoXHTMLEscaping |
0x0010 | Escape only & < > (goldmark-compatible, no ' ") |
type Renderer interface {
EnterBlock(t ast.BlockType, detail any) error
LeaveBlock(t ast.BlockType, detail any) error
EnterSpan(t ast.SpanType, detail any) error
LeaveSpan(t ast.SpanType, detail any) error
Text(t ast.TextType, text []byte) error
}| Mode | Constant | Use Case |
|---|---|---|
| CommonMark | parser.DialectCommonMark |
Standard Markdown, strict compliance |
| GitHub Flavored | parser.DialectGitHub |
GFM extensions (tables / strikethrough / task lists / autolinks) |
DialectGitHub = PermissiveAutolinks | FlagTables | FlagStrikethrough | FlagTasklists | FlagAdmonitions | FlagFootnotes
| Mode | API | Memory | Forward References |
|---|---|---|---|
One-shot []byte |
Parse / Convert |
O(n) | ✅ Fully supported |
Streaming io.Reader |
ParseStream / ConvertStream |
O(line) | ❌ First-seen-first-served |
Recommendation: use Convert for documents < 10 MB; use ConvertStream for very large documents.
| Target | Package | Characteristics |
|---|---|---|
| Plain text | text |
Strips all formatting, preserves text content and semantic boundaries |
| HTML | html |
Full HTML output, XHTML / HTML5 selectable |
| Custom | renderer |
Implement the Renderer interface |
- Reuse the Parser: a
Parsercreated bymd4go.New()can be used for multipleParse()calls - Save memory with streaming:
ConvertStreamreads line by line, with memory usage independent of document size - Automatic BufWriter buffering:
text.NewPlainText(w)andhtml.NewHTML(w)use a 4 KB internal buffer - Enable extensions on demand: register only the extensions you need to reduce parsing overhead
// Enable only tables and strikethrough
p := md4go.New(md4go.WithExtensions(
&extension.Table{},
&extension.Strikethrough{},
))
// All GFM extensions (shortcut)
p := md4go.New(md4go.WithFlags(parser.DialectGitHub))
// Equivalent to:
p := md4go.New(md4go.WithExtensions(extension.GFM...))Available extensions:
| Extension | Syntax |
|---|---|
extension.Strikethrough |
~~strikethrough~~ |
extension.Table |
GFM tables |
extension.Tasklist |
- [x] task |
extension.PermissiveAutolinks |
URL / email / WWW autolinks |
extension.Footnote |
[^1] footnotes |
extension.LatexMath |
$inline$ / $$block$$ |
extension.Wikilink |
[[link]] |
extension.Superscript |
^superscript^ |
extension.Subscript |
~subscript~ |
extension.Spoiler |
` |
extension.Highlight |
==highlight== |
extension.Admonition |
> [!NOTE] admonition blocks |
| Standard | Result |
|---|---|
| CommonMark 0.31 | 652/652 ✅ |
| GFM tables / strikethrough / task lists / autolinks | All passing ✅ |
md4go follows the GFM / CommonMark standards by default. Differences from other implementations fall into two categories: intentional improvements (active by default, no flag needed) and differences alignable via compatibility flags.
| ID | Scenario | Default Behavior | Notes |
|---|---|---|---|
| S-01 | ` | ` in table cells | |
| S-02/04 | Tight list paragraph separation | Preserves \n word boundaries, emits P events |
Better for text extraction |
| S-03 | [[target|label]] wikilink |
Recognized as a wikilink | Supports wikilinks with labels |
| S-05 | Footnote references | Outputs [N] |
Preserves the reference number |
| S-06 | Code spans containing NULL | Recognized and replaced with U+FFFD | Follows CommonMark |
Major differences can be aligned via the GoldmarkCompat preset:
| Scenario | Alignment Flag | Example |
|---|---|---|
| Tables cannot interrupt a paragraph (GFM standard) | FlagTableInterruptParagraph |
Paragraph followed by a table: not recognized as a table by default; with the flag, the last line of the paragraph is promoted to the table header |
| HTML entity decoding | FlagDecodeEntities |
& ©: entities kept as text by default; decoded to & © with the flag |
| Leading UTF-8 BOM stripping | FlagStripBOM |
\ufeffHello: BOM preserved by default; stripped with the flag (goldmark behavior) |
Strikethrough ~~ intraword (md4c stricter than cmark-gfm) |
FlagStrikethroughPermissive |
foo~~bar~~baz: ~~ not recognized intraword by default (md4c behavior); recognized with the flag (cmark-gfm/goldmark behavior) |
| Inline HTML tag stripping (text renderer) | FlagStripHTMLTags |
<span>html</span>: raw HTML preserved by default; tags stripped to html with the flag. Non-visible elements (<script>, <style>, etc.) have their entire content removed, matching goquery DOM text extraction |
| Strict table column count validation | FlagStrictTableColumns |
Header 3 cols, delimiter 2 cols: loosely recognized by default; not recognized as a table with the flag |
| Table interrupted by adjacent header row | FlagTableInterruptByHeaders |
Header row adjacent to another heading: recognized as heading by default; recognized as table header with the flag |
| XHTML entity encoding in HTML renderer | FlagNoXHTMLEntityEncoding |
" and ' encoded as " ' by default; left as plain characters with the flag (goldmark behavior) |
| Inline span / bracket extra spaces | — | Side effect of goldmark's DOM traversal, should not be replicated |
See
DIFFCHECK_REPORT.mdfor the full comparison report.
| Document | Location | Description |
|---|---|---|
| README.md | root | Quick start + API reference (English) |
| README.zh.md | root | 快速上手 + API 参考(中文) |
| ARCHITECTURE.md | root | Architecture design |
| DESIGN.md | root | Algorithm design details |
| TESTING.md | root | Testing system overview |
| wasm/README.md | wasm/ | WebAssembly browser usage guide |
| diffcheck/README.md | diffcheck/ | Engine cross-comparison tool docs |
| DIFFCHECK_REPORT.md | root | Engine comparison report (md4go / md4c / goldmark) |
| benchmark/README.md | benchmark/ | Benchmark suite guide (md4go / md4c / goldmark) |
| benchmark/BENCHMARK_REPORT.md | benchmark/ | Latest benchmark results |
This project was originally ported to Go based on the algorithm design of md4c v0.5.3, with subsequent engineering improvements and standards-compliance enhancements on top.