Version: 0.1.0 Date: 2025-12-29 Related: See SPEC.md §8.4 for normative requirements
3md adopts a fail-fast philosophy: parsers should detect errors early and provide clear, actionable error messages to help authors fix issues quickly.
- Strict validation: Catch errors during parsing, not rendering
- Clear error messages: Include line numbers, context, and suggested fixes
- Helpful feedback: Explain what's wrong and how to correct it
- Non-fatal warnings: Alert authors to potential issues without blocking
All errors are assigned codes for consistent identification and handling:
| Code | Category | Severity | Error Type |
|---|---|---|---|
| E001 | Syntax | Fatal | Missing Language Declaration |
| E002 | Syntax | Fatal | Invalid Language Codes |
| E003 | Syntax | Fatal | Wrong Language Count |
| E004 | Syntax | Fatal | Whitespace in Language Declaration |
| E005 | Validation | Fatal | Mismatched Variant Count |
| E006 | Validation | Fatal | Language Order Inconsistency |
| E007 | Validation | Fatal | Mixed Separators |
| E008 | Syntax | Fatal | Malformed YAML Frontmatter |
| E009 | Syntax | Fatal | Unclosed Entity Reference |
| E010 | Security | Fatal | Unsafe YAML Construct |
| E011 | Security | Fatal | Document Size Exceeded |
| E012 | Validation | Fatal | Invalid Status Value |
| E013 | Syntax | Fatal | Reference-Style Links Not Supported |
| W001 | Warning | Non-Fatal | Potential Mono Block Ambiguity |
| W002 | Warning | Non-Fatal | Empty Variant Without Marker |
| W003 | Warning | Non-Fatal | Undefined Entity Reference |
| W004 | Warning | Non-Fatal | Unused Entity Definition |
| S001 | Style | Informational | Line Too Long |
| S002 | Style | Informational | Inconsistent Separator Usage |
Defined in: SPEC.md §2.3
Invalid Example:
# Heading without {{langs}} declaration
Content here.
Output:
[ERROR E001] Missing language declaration at line 1
# Heading without {{langs}} declaration
^
Every 3md document must begin with a language declaration (after optional frontmatter).
Suggested fix:
{{langs|si|ta|en}}
# Heading~தலைப்பு~සිරස්තලය
Content here.
Conformance: Parsers MUST reject (SPEC.md §2.3)
Defined in: SPEC.md §2.3
Invalid Example:
{{langs|sin|tam|eng}}
Output:
[ERROR E002] Invalid language codes at line 1
{{langs|sin|tam|eng}}
^^^
Invalid language codes. Use ISO 639-1 codes: 'si', 'ta', 'en'
Found: 'sin', 'tam', 'eng'
Suggested fix:
{{langs|si|ta|en}}
Conformance: Parsers MUST reject (SPEC.md §2.3)
Implementation:
VALID_LANG_CODES = {'si', 'ta', 'en'}
def validate_language_codes(codes: list[str]) -> None:
invalid = set(codes) - VALID_LANG_CODES
if invalid:
raise InvalidLanguageCodesError(
f"Invalid codes: {', '.join(invalid)}"
)Defined in: SPEC.md §2.3
Invalid Example 1 (Too Few):
{{langs|si|en}}
Output:
[ERROR E003] Wrong language count at line 1
{{langs|si|en}}
^
3md requires exactly 3 languages. Found 2.
Suggested fix:
{{langs|si|ta|en}}
Invalid Example 2 (Too Many):
{{langs|si|ta|en|fr}}
Output:
[ERROR E003] Wrong language count at line 1
{{langs|si|ta|en|fr}}
^^^
3md requires exactly 3 languages. Found 4.
Suggested fix:
{{langs|si|ta|en}}
Conformance: Parsers MUST reject (SPEC.md §2.3)
Defined in: SPEC.md §2.3
Invalid Example 1 (Around Separators):
{{langs|si | ta | en}}
Output:
[ERROR E004] Invalid whitespace in language declaration at line 1
{{langs|si | ta | en}}
^ ^
Language declaration must not contain whitespace around separators.
Suggested fix:
{{langs|si|ta|en}}
Invalid Example 2 (Inside Braces):
{{ langs|si|ta|en }}
Output:
[ERROR E004] Invalid whitespace in language declaration at line 1
{{ langs|si|ta|en }}
^ ^
Language declaration must not contain whitespace inside braces.
Suggested fix:
{{langs|si|ta|en}}
Conformance: Parsers MUST reject (SPEC.md §2.3)
Defined in: SPEC.md §8.2
Invalid Example:
{{langs|si|ta|en}}
සිංහල පෙළ~தமிழ் உரை
Output:
[ERROR E005] Mismatched variant count at line 3
සිංහල පෙළ~தமிழ் உரை
^
Multi Block must have exactly 3 variants to match language declaration.
Found 2 variants, expected 3 (si, ta, en).
Suggested fix (option 1 - add missing variant):
සිංහල පෙළ~தமிழ் உரை~English text
Suggested fix (option 2 - use {{empty}} marker):
සිංහල පෙළ~தமிழ் உரை~{{empty}}
Conformance: Parsers MUST reject (SPEC.md §8.2)
Implementation:
def validate_variant_count(variants: list[str], langs: list[str]) -> None:
if len(variants) != len(langs):
raise VariantCountError(
f"Found {len(variants)} variants, expected {len(langs)} "
f"({', '.join(langs)})"
)Defined in: SPEC.md §2.3
Invalid Example:
{{langs|si|ta|en}}
English text~සිංහල පෙළ~தமிழ் உரை
Output:
[ERROR E006] Language order inconsistency at line 3
English text~සිංහල පෙළ~தமிழ் உரை
^
Variant order must match language declaration order.
Expected order: si, ta, en
Detected order: en, si, ta (based on Unicode script detection)
Suggested fix:
සිංහල පෙළ~தமிழ் உரை~English text
Conformance: Parsers MUST reject if order can be detected (SPEC.md §2.3)
Note: This error may not be detectable for all content (e.g., numerical data, mixed-script content). Parsers SHOULD attempt detection using Unicode script analysis.
Defined in: SPEC.md §4.4
Invalid Example:
{{langs|si|ta|en}}
සිංහල පෙළ~தமிழ் உரை
෴
English text
Output:
[ERROR E007] Mixed separators in same Multi Block at line 3
සිංහල පෙළ~தமிழ் உரை
^
෴
^
Cannot use both inline (~) and block (\n෴\n) separators in same Multi Block.
Suggested fix (option 1 - use inline only):
සිංහල පෙළ~தமிழ் உரை~English text
Suggested fix (option 2 - use block only):
සිංහල පෙළ
෴
தமிழ் உரை
෴
English text
Conformance: Parsers MUST reject (SPEC.md §4.4)
Defined in: SPEC.md §2.2
Invalid Example:
---
project:
title: "Document Title
# Missing closing quote
---Output:
[ERROR E008] Malformed YAML frontmatter at line 3
title: "Document Title
^
Invalid YAML syntax: Unclosed quoted string
Suggested fix:
---
project:
title: "Document Title"
---
Conformance: Parsers MUST reject (SPEC.md §2.2)
Defined in: SPEC.md §5.8
Invalid Example:
[[Geoffrey Bawa|ජෙෆ්රි බාවා
Output:
[ERROR E009] Unclosed entity reference at line 1
[[Geoffrey Bawa|ජෙෆ්රි බාවා
^
Entity reference missing closing ']]'
Suggested fix:
[[Geoffrey Bawa|ජෙෆ්රි බාවා]]
Conformance: Parsers MUST reject (SPEC.md §5.8)
Defined in: SPEC.md §11.1
Invalid Example:
---
!!python/object/apply:os.system
args: ['rm -rf /']
---Output:
[ERROR E010] Unsafe YAML construct detected at line 2
!!python/object/apply:os.system
^
YAML frontmatter contains potentially unsafe construct.
Only basic YAML types are allowed (strings, numbers, lists, maps).
Security risk: Code execution vulnerability
Conformance: Parsers MUST reject (SPEC.md §11.1)
Implementation:
import yaml
def safe_load_frontmatter(yaml_text: str) -> dict:
"""
Load YAML safely, rejecting dangerous constructs.
"""
try:
# Use safe_load, not load
return yaml.safe_load(yaml_text)
except yaml.YAMLError as e:
raise UnsafeYAMLError(f"Invalid YAML: {e}")Defined in: SPEC.md §11.2
Invalid Example:
# (52.4 MB document)
Output:
[ERROR E011] Document size exceeded at line 1
Document size: 52.4 MB
Maximum allowed: 10 MB
Large documents may cause performance issues or denial-of-service.
Consider splitting into multiple files.
Conformance: Parsers SHOULD implement size limits (SPEC.md §11.2)
Recommended Limits:
- Document size: 10 MB
- Block depth: 100 levels
- Variant length: 1 MB
Defined in: SPEC.md §7.2
Invalid Example:
---
status:
si: completed # Invalid
ta: synced
en: source
---Output:
[ERROR E012] Invalid status value at line 3
si: completed
^^^^^^^^^
Invalid status value 'completed'.
Valid values: source, synced, fuzzy, untranslated, machine
Suggested fix:
status:
si: synced
ta: synced
en: source
Conformance: Parsers MUST reject (SPEC.md §7.2)
Valid status values:
source: Authoritative contentsynced: Translation verifiedfuzzy: Needs reviewuntranslated: Not translatedmachine: Machine-translated
Defined in: SPEC.md §5.7.5
Invalid Example:
{{langs|si|ta|en}}
See [documentation][ref] for details.
[ref]: https://example.com
Output:
[ERROR E013] Reference-style links not supported at line 3
See [documentation][ref] for details.
^^^^^
Reference-style links ([text][ref] with [ref]: url) are not supported in 3md
due to syntax conflict with entity references [[entity-id]].
Suggested fix (option 1 - inline link):
See [documentation](https://example.com) for details.
Suggested fix (option 2 - entity reference):
# Define in frontmatter:
entities:
docs:
primary: "Documentation"
url: "https://example.com"
# Use in content:
See [[docs|documentation]] for details.
Conformance: Parsers MUST reject (SPEC.md §5.7.5)
Defined in: SPEC.md §3.3
Example:
{{langs|si|ta|en}}
සිංහල පෙළ පමණි.
Output:
[WARNING W001] Potential Mono Block ambiguity at line 3
සිංහල පෙළ පමණි.
^
Content appears to be in a single language (Sinhala) without separators.
This will be parsed as Mono Block (language-invariant).
If this is intended as:
- Language-invariant content → No action needed
- Incomplete multilingual content → Add variants or use {{empty}}
Suggested fix for incomplete translation:
සිංහල පෙළ පමණි.~{{empty}}~{{empty}}
Conformance: Parsers SHOULD warn (SPEC.md §8.4)
Implementation:
def detect_script(text: str) -> str:
"""
Detect primary Unicode script in text.
"""
for char in text:
code = ord(char)
if 0x0D80 <= code <= 0x0DFF:
return 'Sinhala'
elif 0x0B80 <= code <= 0x0BFF:
return 'Tamil'
elif code <= 0x00FF:
return 'Latin'
return 'Unknown'
def check_mono_ambiguity(content: str) -> bool:
"""
Check if mono block appears to be single-language.
"""
scripts = set()
for char in content:
script = detect_script(char)
if script in {'Sinhala', 'Tamil', 'Latin'}:
scripts.add(script)
return len(scripts) == 1Defined in: SPEC.md §8.3
Example:
{{langs|si|ta|en}}
සිංහල පෙළ.~~English text.
Output:
[WARNING W002] Empty variant without {{empty}} marker at line 3
සිංහල පෙළ.~~English text.
^
Empty variant detected (position 2 of 3).
Consider using explicit {{empty}} marker for clarity.
Suggested fix:
සිංහල පෙළ.~{{empty}}~English text.
Conformance: Parsers SHOULD warn (SPEC.md §8.4)
Note: See The {{empty}} Marker section for details.
Defined in: SPEC.md §5.8
Example:
{{langs|si|ta|en}}
[[unknown-entity]] is referenced here.~[[unknown-entity]] සඳහන් කර ඇත.~[[unknown-entity]] குறிப்பிடப்பட்டுள்ளது.
Output:
[WARNING W003] Undefined entity reference at line 3
[[unknown-entity]] is referenced here.
^^^^^^^^^^^^^^
Entity 'unknown-entity' not defined in frontmatter.
Link will use default /term/unknown-entity URL.
Suggested fix - add to frontmatter:
---
entities:
unknown-entity:
primary: "Entity Name"
si: "ආයතන නම"
ta: "நிறுவன பெயர்"
---
Conformance: Parsers SHOULD warn (SPEC.md §8.4)
Defined in: SPEC.md §7.3
Example:
---
entities:
bawa:
primary: "Geoffrey Bawa"
si: "ජෙෆ්රි බාවා"
ta: "ஜெஃப்ரி பாவா"
unused-entity:
primary: "Never Referenced"
---
{{langs|si|ta|en}}
[[bawa]] is a renowned architect.Output:
[WARNING W004] Unused entity definition in frontmatter
entities:
unused-entity:
^^^^^^^^^^^^^
Entity 'unused-entity' defined in frontmatter but never referenced in document.
Conformance: Parsers SHOULD warn (SPEC.md §8.4)
Example:
{{langs|si|ta|en}}
This is an extremely long line that exceeds the recommended 120 character limit which makes it harder to read and edit in most text editors.~මෙය නිර්දේශිත අක්ෂර 120 සීමාව ඉක්මවන ඉතා දිගු රේඛාවකි.~இது பரிந்துரைக்கப்பட்ட 120 எழுத்துக்கு மேல் செல்லும் மிக நீண்ட வரி.
Output:
[STYLE S001] Line exceeds 120 characters at line 3
Length: 287 characters
Recommended: ≤120 characters
Consider breaking into multiple lines or using block format.
Suggested fix (use block format):
This is an extremely long line that exceeds the recommended
120 character limit which makes it harder to read and edit
in most text editors.
෴
මෙය නිර්දේශිත අක්ෂර 120 සීමාව ඉක්මවන ඉතා දිගු රේඛාවකි.
෴
இது பரிந்துரைக்கப்பட்ட 120 எழுத்துக்கு மேல் செல்லும் மிக நீண்ட வரி.
Conformance: Parsers MAY suggest (informational only)
Example:
{{langs|si|ta|en}}
# Heading~தலைப்பு~සිරස්තලය
පළමු ඡේදය.
෴
முதல் பத்தி.
෴
First paragraph.
දෙවන ඡේදය.~இரண்டாவது பத்தி.~Second paragraph.
Output:
[STYLE S002] Inconsistent separator usage
Document uses both inline (~) and block (\n෴\n) separators for similar content types.
Line 3: Inline separator for heading
Line 5: Block separator for paragraph
Line 11: Inline separator for paragraph
Consider using consistent separator style throughout document.
Conformance: Parsers MAY suggest (informational only)
The {{empty}} marker is a special placeholder that indicates intentionally missing content in a Multi Block variant.
Purpose:
- Makes incomplete translations explicit
- Distinguishes from truly language-invariant (Mono Block) content
- Helps parsers and validators understand author intent
Use {{empty}} when:
-
Translation is pending:
සිංහල පෙළ.~{{empty}}~English text.(Tamil translation not yet available)
-
Content doesn't apply in a language:
Cultural reference specific to Sri Lanka.~{{empty}}~{{empty}}(Only meaningful in Sinhala context)
-
Placeholder for future content:
{{empty}}~{{empty}}~Draft English version (translations pending)
Inline format:
{{langs|si|ta|en}}
Content 1~{{empty}}~Content 3
සිංහල පෙළ.~தமிழ் உரை~{{empty}}
Block format:
{{langs|si|ta|en}}
Content in first language
෴
{{empty}}
෴
Content in third language
Empty variant without marker:
Content 1~~Content 3 # Warning: middle variant empty
සිංහල පෙළ.~தமிழ் உரை~ # Warning: trailing empty
Should be:
Content 1~{{empty}}~Content 3
සිංහල පෙළ.~தமிழ் உரை~{{empty}}
Parsing:
- Recognize
{{empty}}as special marker - Count as valid variant (satisfies variant count requirement)
- Do not emit W002 warning for explicit
{{empty}}
Rendering:
def render_variant(content: str, lang: str) -> str:
"""
Render a variant, handling {{empty}} marker.
"""
if content.strip() == '{{empty}}':
return '' # Render as empty content
return render_markdown(content)Output:
- HTML: Empty element or skip rendering
- Per-language Markdown: Empty line or omit
- JSON AST:
nullor empty string
Example:
# Input
variants = {
'si': 'සිංහල පෙළ',
'ta': '{{empty}}',
'en': 'English text'
}
# HTML output
<p lang="si">සිංහල පෙළ</p>
<!-- ta: empty, skipped -->
<p lang="en">English text</p>
# JSON output
{
"si": "සිංහල පෙළ",
"ta": null,
"en": "English text"
}Based on SPEC.md §10 and RFC 2119:
-
Reject documents with critical errors:
- All E001-E013 errors MUST cause parsing to fail
- Provide clear error messages with line numbers
- Include context (surrounding lines) and caret indicators
-
Validate core requirements:
- Language declaration present and valid (E001, E002)
- Exactly 3 languages (E003)
- No whitespace in declaration (E004)
- Variant count matches declaration (E005)
- Valid YAML frontmatter if present (E008)
-
Provide helpful feedback:
- Suggest fixes for common errors
- Include error codes for programmatic handling
- Explain what's wrong and why
-
Emit warnings for non-fatal issues:
- Potential Mono Block ambiguity (W001)
- Empty variants without
{{empty}}marker (W002) - Undefined entity references (W003)
- Unused entity definitions (W004)
-
Implement security checks:
- Detect unsafe YAML constructs (E010)
- Enforce document size limits (E011)
- Validate frontmatter schema (E012)
-
Use Unicode script detection:
- For W001 (Mono Block ambiguity detection)
- For E006 (language order validation, when possible)
-
Support {{empty}} marker:
- Recognize as valid placeholder
- Render appropriately in output
- Don't warn when used explicitly
-
Provide style suggestions:
- Line length recommendations (S001)
- Consistent separator usage (S002)
- Formatting improvements
-
Implement error recovery:
- Attempt to parse despite errors (with warnings)
- Suggest automatic fixes
- Generate partial output
-
Enhanced validation:
- Check entity reference consistency
- Validate URL formats
- Detect content duplication
All error messages SHOULD follow this template for consistency:
[LEVEL CODE] Error description at line N[, column M]
<code showing problematic line(s)>
<caret indicator (^) pointing to issue>
<Clear explanation of what's wrong>
<Suggested fix (if applicable):>
<corrected code example>
Levels:
ERROR- Fatal errors (E001-E013)WARNING- Non-fatal warnings (W001-W004)STYLE- Style suggestions (S001-S002)
Example:
[ERROR E005] Mismatched variant count at line 3
සිංහල පෙළ~தமிழ் உரை
^
Multi Block must have exactly 3 variants to match language declaration.
Found 2 variants, expected 3 (si, ta, en).
Suggested fix:
සිංහල පෙළ~தமிழ் உரை~English text
When parsers encounter errors, they MAY attempt recovery:
Empty variants:
if variant == '':
variant = '{{empty}}' # Auto-insert marker
emit_warning(W002)Extra variants:
if len(variants) > len(langs):
variants = variants[:len(langs)] # Truncate with warning
emit_warning("Extra variants ignored")Missing variants:
while len(variants) < len(langs):
variants.append('{{empty}}') # Pad with warning
emit_error(E005)For W001 and E006, use Unicode ranges:
def get_unicode_script(char: str) -> str:
"""
Detect Unicode script for a character.
"""
code = ord(char)
# Sinhala: U+0D80–U+0DFF
if 0x0D80 <= code <= 0x0DFF:
return 'Sinhala'
# Tamil: U+0B80–U+0BFF
elif 0x0B80 <= code <= 0x0BFF:
return 'Tamil'
# Latin: Basic + Latin-1 Supplement
elif code <= 0x00FF:
return 'Latin'
# Common/Unknown
else:
return 'Common'
def detect_primary_script(text: str) -> str:
"""
Detect primary script in text block.
"""
script_counts = {}
for char in text:
if not char.isspace():
script = get_unicode_script(char)
script_counts[script] = script_counts.get(script, 0) + 1
# Return most common non-Common script
if script_counts:
return max(script_counts.items(), key=lambda x: x[1])[0]
return 'Unknown'enum ErrorSeverity {
FATAL, // E001-E013: Parsing fails
WARNING, // W001-W004: Parsing succeeds with warnings
INFO // S001-S002: Informational suggestions
}
interface ParseError {
code: string; // "E001", "W002", etc.
severity: ErrorSeverity;
line: number;
column?: number;
context: string; // Surrounding code
message: string; // Human-readable explanation
suggestion?: string; // Suggested fix
}When implementing a 3md parser, ensure:
- All E001-E013 errors are detected and rejected
- Error messages include line numbers
- Error messages include context (code snippet)
- Error messages include caret (^) indicators
- Error messages suggest fixes
- W001-W004 warnings are emitted (optional but recommended)
- {{empty}} marker is recognized and handled
- Unicode script detection is implemented (for W001, E006)
- YAML frontmatter is safely parsed (no code execution)
- Document size limits are enforced (recommended: 10MB)
- Terminology is consistent (Multi Block, Mono Block)
- Error codes match this specification
Last Updated: 2025-12-29 Maintainers: TriText Team See Also: SPEC.md, IMPLEMENTATION.md