🧠 GdoczAI v1.0.0 — Initial Public Release
Transform PDFs, Images & Documents into Structured Intelligence with Multi-Engine OCR.
✨ Features
- Multi-Engine OCR — OlmOCR, Qwen2.5-7B VL, Gemini 2.0/2.5, Chandra with intelligent routing
- Schema-Based Extraction — Define once, extract complex nested JSON structures
- Intelligent Chunking — Auto segmentation + manual splitting for edge cases
- Email Integration — SMTP/IMAP monitoring for automatic document ingestion
- JWT Security — Token refresh, API keys, rate limiting
- Dual Storage — Local filesystem or AWS S3 with date-based organization
- Webhooks — Real-time processing notifications
- Cross-Page Extraction — Seamless multi-page data extraction
- Background Processing — Async jobs with priority queues
- PostgreSQL — Full audit trails and metadata logging
🚀 Quick Start
See README.md for installation instructions.
📦 Requirements
- Python 3.9+
- PostgreSQL 12+
- DeepInfra API Key (OlmOCR/Qwen)
- Datalab API Key (Chandra)
- Gemini API Key (Google)