Skip to content

Releases: IranTransitionProject/docman

v0.5.0 — Schema Refs, Smart Extraction

20 Mar 23:58

Choose a tag to compare

What's New

Pydantic Schema Refs

All worker configs migrated from inline JSON Schema to input_schema_ref / output_schema_ref pointing to typed Pydantic models in src/docman/contracts.py. Schemas are resolved at config load time via Loom's resolve_schema_refs().

Smart Extraction

SmartExtractorBackend — composite backend that tries MarkItDown first (fast, no ML) and falls back to Docling (deep OCR, table recognition) when needed. Configurable fallback thresholds.

Built on Loom v0.8.0

Requires Loom v0.8.0 or later.

Installation

# Requires loom cloned adjacent
git clone https://github.com/IranTransitionProject/docman.git
cd docman
uv sync --extra dev
uv run pytest tests/ -v   # 63 tests

Stats

  • 5 worker configs, 3 pipeline variants, 1 MCP gateway config
  • 63 unit tests
  • 3 extraction backends (MarkItDown, Docling, Smart)