#

document-intelligence

Here are 331 public repositories matching this topic...

PaddleNLP

PaddlePaddle / PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

nlp search-engine compression sentiment-analysis transformers information-extraction question-answering llama pretrained-models embedding bert semantic-analysis distributed-training ernie neural-search uie document-intelligence paddlenlp llm

Updated May 23, 2026
Python

kreuzberg

kreuzberg-dev / kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

Updated Jun 22, 2026
Rust

contextgem

shcherbak-ai / contextgem

ContextGem: Effortless LLM extraction from documents

nlp ai text-analysis docx data-extraction contract-analysis legaltech docx2txt unstructured-data document-intelligence llm docx2md prompt-engineering llms generative-ai llm-framework llm-pipeline llm-extraction

Updated Jun 6, 2026
Python

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Updated Mar 17, 2026
C++

ExtractThinker

enoch3712 / ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

python nlp pdf machine-learning ocr ai openai pdf-to-text document-processing document-image-analysis document-intelligence llm document-parsing langchain

Updated Aug 27, 2025
Python

tstanislawek / awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

Updated Jun 2, 2023

Azure / AI-in-a-Box

AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.

machine-learning ai azure chatbot openai chat-bot edge-computing custom-vision edge-ai azd document-intelligence azd-templates chatgpt langchain semantic-kernel

Updated Dec 12, 2024
Jupyter Notebook

ArkhamMirror

mantisfury / ArkhamMirror

Local-first AI-powered document intelligence platform for investigative journalism

Updated Jan 25, 2026
Python

infly-ai / INF-MLLM

INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced document intelligence.

vlm document-intelligence mllm multimodal-rl

Updated May 15, 2026
Python

Orbifold / knwler

Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.

knowledge-graph document-intelligence

Updated Apr 3, 2026
Python

Azure-Samples / azure-ai-document-processing-samples

A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.

redaction translation ai azure extraction embeddings openai classification gpt document-intelligence

Updated Oct 27, 2025
Bicep

doc-analysis / ReadingBank

ReadingBank: A Benchmark Dataset for Reading Order Detection

nlp natural-language-processing ocr document-understanding document-ai document-intelligence

Updated Aug 26, 2024

Mattral / RAG-Multimodal-Financial-Doc-Analysis-and-Recall

World-class multimodal RAG system for financial document analysis. Built to production standards: async, observable, secure, multi-tenant, CI-gated.

machine-learning observability multimodal production-system rag pydantic document-intelligence llm financial-ai llama-index retrieval-augmented-generation enterprise-ai async-processing

Updated Jun 21, 2026
Python

Azure-Samples / doc-intelligence-in-a-box

The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.

ai azure accelerator text-extraction cognitive-services azd document-intelligence form-analysis azd-templates

Updated Mar 27, 2026
Bicep

https-deeplearning-ai / sc-landingai

Course Website

aws document-processing document-intelligence document-processing-pipeline agentic-workflow

Updated Jun 5, 2026
Jupyter Notebook

knowledgestack / excel-parser

XLSX parser for LLMs, RAG, LangChain, LangGraph, CrewAI, Claude, MCP — turns Excel (.xlsx) into citation-ready JSON with formulas, charts, dependency graphs, and token-counted chunks. Open-source Python library (MIT).

Updated Jun 10, 2026
Python

vectorlessflow / vectorless

Knowing by reasoning, not vectors. ⭐ Star this repo if you find it useful.

python rust question-answering reasoning document-intelligence context-engineering no-vectors

Updated Jun 4, 2026
Rust

qyhou / curated-table-structure-recognition

A curated list of resources on Table Structure Recognition

table-recognition table-structure-recognition document-ai document-intelligence

Updated Jul 31, 2025

jamesmcroft / azure-document-intelligence-markdown-to-openai-data-extraction-sample

This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.

azure openai gpt document-intelligence

Updated May 12, 2026
Jupyter Notebook

LongParser

ENDEVSOLS / LongParser

Privacy-first document intelligence engine — parse PDFs, DOCX, PPTX, XLSX & CSV into AI-ready chunks for RAG pipelines. Includes HITL review, 3-layer memory chat, and a production FastAPI server.

python ocr parsing openai chunking human-in-the-loop pdf-parser rag fastapi vector-database document-intelligence llm document-parsing langchain retrieval-augmented-generation docling

Updated May 5, 2026
Python

Improve this page

Add a description, image, and links to the document-intelligence topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-intelligence topic, visit your repo's landing page and select "manage topics."