Skip to content

v0.5.0 — Schema Refs, Smart Extraction

Latest

Choose a tag to compare

@hooman hooman released this 20 Mar 23:58
· 10 commits to main since this release

What's New

Pydantic Schema Refs

All worker configs migrated from inline JSON Schema to input_schema_ref / output_schema_ref pointing to typed Pydantic models in src/docman/contracts.py. Schemas are resolved at config load time via Loom's resolve_schema_refs().

Smart Extraction

SmartExtractorBackend — composite backend that tries MarkItDown first (fast, no ML) and falls back to Docling (deep OCR, table recognition) when needed. Configurable fallback thresholds.

Built on Loom v0.8.0

Requires Loom v0.8.0 or later.

Installation

# Requires loom cloned adjacent
git clone https://github.com/IranTransitionProject/docman.git
cd docman
uv sync --extra dev
uv run pytest tests/ -v   # 63 tests

Stats

  • 5 worker configs, 3 pipeline variants, 1 MCP gateway config
  • 63 unit tests
  • 3 extraction backends (MarkItDown, Docling, Smart)