A comprehensive learning journey through modern document intelligence techniques, from traditional OCR to advanced agentic extraction systems
- Overview
- Key Features
- Architecture
- Course Structure
- Prerequisites
- Installation
- Quick Start
- Lessons Overview
- RAG Pipeline with AWS
- Helper Utilities
- Project Structure
- Technical Resources
- Troubleshooting
- Contributing
- License
This repository contains a comprehensive course on Document AI and Intelligent Document Processing (IDP), covering the complete evolution from traditional OCR to modern agentic document extraction systems. Learn how to build production-ready document understanding pipelines using cutting-edge technologies.
- π Traditional OCR: Understanding Tesseract and foundational OCR techniques
- π§ Deep Learning OCR: PaddleOCR and neural network-based text recognition
- π Layout Analysis: LayoutLM and LayoutReader for document structure understanding
- π€ Agentic Extraction: LandingAI's ADE (Agentic Document Extraction)
- βοΈ Cloud Deployment: Building RAG pipelines with AWS Bedrock and Lambda
- π¬ Conversational AI: Creating document-based chatbots with Strands Agents
| Feature | Description |
|---|---|
| Multi-Modal Processing | Handle PDFs, images, tables, and complex layouts |
| Visual Grounding | Maintain bounding box information for precise chunk extraction |
| Production-Ready | AWS Lambda integration for scalable document processing |
| RAG Pipeline | Complete Retrieval-Augmented Generation system |
| Interactive Learning | Jupyter notebooks with hands-on examples |
| Real-World Use Cases | Medical documents, invoices, receipts, forms, and more |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Document AI Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Input Document Processing β
β (PDF, Images, Scanned Documents) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Traditional OCR β β Deep Learning β
β (Tesseract) β β OCR (Paddle) β
ββββββββββββββββββββ ββββββββββββββββββββ
β β
βββββββββββββββββββββββ¬ββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Layout Understanding β
β (LayoutLM, LayoutReader) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Agentic Document Extraction β
β (LandingAI ADE) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Structured β β RAG Pipeline β
β Output β β (AWS Bedrock) β
β (Markdown, JSON)β β β
ββββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Chatbot Agent β
β (Strands Agents) β
ββββββββββββββββββββ
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β S3 Bucket ββββββββββΆβ Lambda ββββββββββΆβ LandingAI β
β (PDF Upload)β β Function β β ADE β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β
β Process
βΌ
βββββββββββββββββββ
β Extract Chunks β
β + Grounding β
β + Metadata β
βββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ€
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Markdown Output β β Chunk JSONs β
β (S3 Storage) β β + Bounding Boxesβ
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Bedrock β
β Knowledge Base β
β (Vector DB) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Strands Agents β
β Chatbot β
β + Visual Ground β
βββββββββββββββββββ
This repository contains a course or tutorial series focused on Document AI , progressing from basic OCR (Optical Character Recognition) to advanced Agentic Document Extraction (ADE) and RAG (Retrieval-Augmented Generation) pipelines.
The course is organized into several "Labs," each contained within its own subdirectory. The directory naming convention (L2, L4, etc.) suggests a step-by-step progression, though the internal Lab numbering differs slightly.
| Directory | Notebook File | Lab Title | Key Topics |
|---|---|---|---|
| L2 | L2.ipynb | Lab 1: Document Processing with OCR | β’ Basic OCR with Tesseract β’ Parsing sample documentsβ’ Using Regex for extraction β’ Building a simple OCR agentβ’ Limitations of basic OCR |
| L4 | L4.ipynb | Lab 2: Document Processing with PaddleOCR | β’ Advanced OCR with PaddleOCR (Deep Learning based)β’ Text detection vs. recognitionβ’ Layout detection``β’ Handling tables and handwriting |
| L6 | L6.ipynb | Lab 3: Building Agentic Document Understanding | β’ LayoutReader for reading order β’ Vision-Language Model (VLM) for charts/tablesβ’ Building custom tools (AnalyzeChart, AnalyzeTable)``β’ Assembling a LangChain agent |
| L8 | L8.ipynb | Lab 4: Agentic Document Extraction (Part I) | β’ Introduction to LandingAI's ADE framework β’ Vision-First, Data-Centric, Agentic approachβ’ Extracting Key-Value pairs``β’ Handling "difficult" documents (charts, handwritten forms) |
| L9 | L9.ipynb | Lab 4: Agentic Document Extraction (Part II) | β’ Processing multiple document types β’ Document categorization schemasβ’ Validation logic for extractions``β’ Building a full processing pipeline |
| L11 | L11.ipynb | Lab 5: Agentic Document Extraction for RAG | β’ RAG (Retrieval-Augmented Generation) with documents β’ Preprocessing, Retrieval, Generation phasesβ’ Vector database setup (ChromaDB)``β’ Visual grounding in RAG |
- Notebooks (.ipynb) : The core interactive lessons containing code, explanations, and exercises.
helper.py** : A shared utility file present in multiple directories (and root), likely containing helper functions for image processing, visualization, or API interaction.
rag_pipeline_aws/: A separate directory likely containing a more production-oriented or cloud-based RAG implementation (based on the name).- Images/Assets : Each Lab directory contains sample images (invoice.png,receipt.jpg,apple_10k.pdf) used for testing the document processing pipelines.
To begin, it is recommended to start with Lab 1 (
L2/L2.ipynb) to understand the basics of OCR and agent construction before moving to more advanced topics like PaddleOCR and Agentic Extraction.
The course is organized into progressive lessons, each building upon previous concepts:
| Lesson | Topic | Technologies | Difficulty |
|---|---|---|---|
| L1 | Introduction to OCR | Tesseract | β Beginner |
| L2 | Document Processing | Tesseract, PaddleOCR | ββ Beginner |
| L3 | Layout Analysis | LayoutLM | ββ Intermediate |
| L4 | Advanced OCR | PaddleOCR | ββ Intermediate |
| L6 | Reading Order | LayoutReader | βββ Intermediate |
| L8 | Agentic Extraction | LandingAI ADE | βββ Advanced |
| L9 | Batch Processing | LandingAI ADE | βββ Advanced |
| L11 | RAG with ChromaDB | ChromaDB, LangChain | ββββ Advanced |
| Lab 6 | AWS RAG Pipeline | AWS Bedrock, Lambda, Strands | βββββ Expert |
- Python: Version 3.10 (recommended)
- OS: Linux, macOS, or Windows (Linux x86_64 recommended for AWS Lambda)
- Memory: 8GB RAM minimum (16GB recommended)
- Storage: 5GB free space
-
LandingAI Account (Free tier available)
- Sign up at LandingAI
- Get your Vision Agent API key
-
AWS Account (for Lab 6 only)
- Required services: S3, Lambda, Bedrock, IAM
- Estimated cost: ~$5-10/month for testing
-
OpenAI Account (optional, for advanced features)
git clone https://github.com/Ahmed-El-Zainy/Document-AI-From-OCR-to-Agentic-Doc-Extraction.git
cd document_ai_from_OCR_to_agentic_doc_extraction# Using venv
python3.10.6 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Or using conda
conda create -n docai python=3.10.6
conda activate docai# Install core dependencies
pip install -r requirements.txt
# For AWS Lab 6 (optional)
pip install boto3 bedrock-agentcore strands-agents# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows
# Download from: https://github.com/UB-Mannheim/tesseract/wikipip install paddlepaddle==3.0.0 paddleocrCreate a .env file in the project root:
# Mosted Used in notebooks
#### LandingAI Configuration
VISION_AGENT_API_KEY= your_landingai_api_key_here
HF_TOKEN = your_huggingface_api_key # chat models & emebedding models
GROQ_API_KEY = your_groq_api_key # vlm models
# AWS Configuration (for Lab 6)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-west-2
S3_BUCKET=your-bucket-name
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
BEDROCK_KB_ID=your_knowledge_base_idimport pytesseract
from PIL import Image
# Load and process image
image = Image.open("L2/invoice.png")
text = pytesseract.image_to_string(image)
print(text)from paddleocr import PaddleOCR
# Initialize OCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')
# Process document
result = ocr.ocr('L4/bank_statement.png', cls=True)
# Extract text
for line in result:
for word_info in line:
print(word_info[1][0]) # Extracted textfrom landingai.ade import ADEClient
# Initialize ADE client
client = ADEClient(api_key="your_api_key")
# Parse document with visual grounding
response = client.parse(
document_path="document.pdf",
extract_tables=True,
extract_figures=True
)
# Access structured output
markdown_content = response.markdown
groundings = response.grounding # Bounding box information
print(markdown_content)from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# Load vector database
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=OpenAIEmbeddings()
)
# Query documents
results = vectorstore.similarity_search(
"What are the company's revenue figures?",
k=5
)
for doc in results:
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}\n")Focus: Traditional OCR with Tesseract and PaddleOCR
π Directory: L2/
L2.ipynb- Main tutorial notebookinvoice.png,receipt.jpg,table.png- Sample documents
Key Concepts:
- Image preprocessing techniques
- Text extraction from various document types
- Handling tables and forms
- Comparing Tesseract vs. PaddleOCR performance
Use Cases:
- β Simple invoices and receipts
- β Clean, scanned documents
β οΈ Limited table structure recognition- β Complex layouts not well supported
Focus: Deep learning-based OCR with better accuracy
π Directory: L4/
L4.ipynb- Advanced OCR techniquesbank_statement.png,handwritten.jpg- Complex documents
Key Concepts:
- Neural network-based text detection
- Multi-language support
- Angle classification for rotated text
- Confidence scoring
Improvements over L2:
- β Better handling of curved or rotated text
- β Improved accuracy on low-quality scans
- β Multi-language text recognition
- β Handwriting recognition support
Focus: Understanding document structure and reading order
π Directory: L6/
L6.ipynb- Layout understanding tutoriallayoutreader/- LayoutReader implementationarchitecture.png,report_layout.png- Visualization examples
Key Concepts:
- Document layout analysis
- Reading order determination
- Relationship between text blocks
- Visual structure recognition
Architecture:
Document Image
β
βΌ
ββββββββββββββββ
β Layout Model β (LayoutLM/LayoutLMv2)
ββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Bounding β (Text blocks, tables, figures)
β Boxes β
ββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Reading Orderβ (Sequence prediction)
β Determinationβ
ββββββββββββββββ
β
βΌ
Structured Output
Focus: Modern AI-powered document understanding with LandingAI ADE
π Directory: L8/
L8.ipynb- ADE comprehensive tutorialhelper.py- Visualization utilitiesutility_example/- Advanced examplesdifficult_examples/- Edge cases
Key Concepts:
- Agentic approach to document extraction
- Automatic chunk detection (text, tables, figures)
- Visual grounding with bounding boxes
- Markdown output with preserved structure
- Confidence scoring for extractions
Chunk Types:
- π
chunkText- Regular text paragraphs - π
chunkTable- Structured tables - πΌοΈ
chunkFigure- Images and diagrams - π·οΈ
chunkLogo- Company logos - π
chunkCard- Business cards - βοΈ
chunkAttestation- Signatures - π±
chunkScanCode- QR/Barcodes - π
chunkForm- Form fields
Visualization Example:
from helper import draw_bounding_boxes
# Parse document
response = ade_client.parse("document.pdf")
# Draw bounding boxes on chunks
draw_bounding_boxes(response, "document.pdf")Output: Color-coded bounding boxes showing:
- π’ Green: Text chunks
- π΅ Blue: Tables
- π£ Purple: Marginalia
- π Orange: Cards
Focus: Processing multiple documents efficiently
π Directory: L9/
L9.ipynb- Batch processing workflowinput_folder/- Sample documents for batch processingresults/- Processed outputsresults_extracted/- Extracted structured data
Key Concepts:
- Batch document processing
- Parallel processing strategies
- Error handling and logging
- Output organization
- Performance optimization
Workflow:
import os
from pathlib import Path
from landingai.ade import ADEClient
client = ADEClient(api_key=os.getenv("VISION_AGENT_API_KEY"))
input_dir = Path("input_folder")
output_dir = Path("results")
for doc_path in input_dir.glob("*.pdf"):
try:
response = client.parse(doc_path)
# Save markdown
(output_dir / f"{doc_path.stem}.md").write_text(response.markdown)
# Save grounding data
# ... (save JSON with bounding boxes)
print(f"β
Processed: {doc_path.name}")
except Exception as e:
print(f"β Failed: {doc_path.name} - {e}")Focus: Building a Retrieval-Augmented Generation system
π Directory: L11/
L11.ipynb- RAG implementation tutorialapple_10k.pdf- Sample financial documentchroma_db/- Vector database storageade_outputs/- Processed document chunks
Key Concepts:
- Document chunking strategies
- Vector embeddings
- Semantic search
- Context retrieval
- LLM integration for Q&A
RAG Pipeline Flow:
Document (PDF)
β
βΌ
ADE Parse ββββββ> Markdown + Grounding
β
βΌ
Chunking ββββββ> Semantic segments
β
βΌ
Embeddings ββββββ> Vector representations
β
βΌ
ChromaDB ββββββ> Vector storage
β
βΌ
User Query
β
βΌ
Similarity ββββββ> Retrieve relevant chunks
Search
β
βΌ
LLM + Context βββ> Generate answer
Example Query Flow:
# 1. Load and parse document
response = ade_client.parse("apple_10k.pdf")
# 2. Create chunks
chunks = create_semantic_chunks(response.markdown)
# 3. Store in vector DB
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(),
persist_directory="./chroma_db"
)
# 4. Query
query = "What was Apple's revenue in 2023?"
docs = vectorstore.similarity_search(query, k=5)
# 5. Generate answer with LLM
context = "\n\n".join([doc.page_content for doc in docs])
answer = llm.invoke(f"Context: {context}\n\nQuestion: {query}")Lab 6 demonstrates a production-ready document intelligence system using AWS services.
π Directory: rag_pipeline_aws/
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS Cloud Infrastructure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββ
β User Upload β
β PDF to S3 β
ββββββββ¬ββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β S3 Bucket Structure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β input/ β
β βββ medical/ β
β βββ research_papers.pdf β
β β
β output/ β
β βββ medical/ (Markdown) β
β βββ medical_grounding/ (Bounding boxes) β
β βββ medical_chunks/ (Chunk JSONs) β
β βββ medical_chunk_images/ (Cropped images) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β S3 Event Trigger
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β Lambda Function β
β (ade_s3_handler.py) β
ββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Triggered on S3 upload β
β β’ Calls LandingAI ADE API β
β β’ Processes document β
β β’ Creates chunk JSONs β
β β’ Saves to S3 output/ β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β AWS Bedrock Knowledge Base β
ββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Indexes chunk JSONs β
β β’ Maintains metadata β
β β’ Vector embeddings β
β β’ Semantic search β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β Strands Agent Framework β
ββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Orchestrates conversation β
β β’ Queries Knowledge Base β
β β’ Visual grounding tool β
β β’ Bedrock Memory Service β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β User Interaction β
β β’ Ask questions about documents β
β β’ Get answers with source citations β
β β’ View highlighted document regions β
ββββββββββββββββββββββββββββββββββββββββββββ
Lab-6.ipynb- Main tutorial notebookade_s3_handler.py- Lambda function for document processinglambda_helpers.py- Deployment utilitiesvisual_grounding_helper.py- Chunk image extractionmedical/- Sample medical research papers
# 1. Configure AWS credentials
aws configure
# 2. Create S3 bucket
aws s3 mb s3://your-doc-bucket
aws s3api put-object --bucket your-doc-bucket --key input/
aws s3api put-object --bucket your-doc-bucket --key output/
# 3. Deploy Lambda (see Lab-6.ipynb for details)
# 4. Create Bedrock Knowledge Base
# 5. Upload documents and start chatting!For detailed setup instructions, see rag_pipeline_aws/README.md
The helper.py file provides essential utilities for document visualization and processing.
from helper import print_document
# Display PDF or image in notebook
print_document("document.pdf")
print_document("image.png")from helper import draw_bounding_boxes
# Draw color-coded bounding boxes
parse_response = ade_client.parse("document.pdf")
annotated_image = draw_bounding_boxes(parse_response, "document.pdf")Color Scheme:
- π’ Green (40, 167, 69): Text chunks (
chunkText) - π΅ Blue (0, 123, 255): Tables (
chunkTable) - π£ Purple (111, 66, 193): Marginalia (
chunkMarginalia) - π‘ Magenta (255, 0, 255): Figures (
chunkFigure) - π’ Light Green (144, 238, 144): Logos (
chunkLogo) - π Orange (255, 165, 0): Cards (
chunkCard) - π΅ Cyan (0, 255, 255): Attestations (
chunkAttestation) - π‘ Yellow (255, 193, 7): Scan codes (
chunkScanCode) - π΄ Red (220, 20, 60): Forms (
chunkForm)
document_ai_from_OCR_to_agentic_doc_extraction/
β
βββ README.md # This comprehensive guide
βββ requirements.txt # Python dependencies
βββ helper.py # Global utility functions
βββ .env # Environment variables (not in git)
βββ .gitignore # Git ignore rules
β
βββ L2/ # Lesson 2: Basic OCR
β βββ L2.ipynb # Jupyter notebook
β βββ l2_doc_processing.py # Python utilities
β βββ invoice.png # Sample invoice
β βββ receipt.jpg # Sample receipt
β βββ table.png # Sample table
β βββ requirements.txt # Lesson-specific deps
β
βββ L4/ # Lesson 4: PaddleOCR
β βββ L4.ipynb
β βββ l4_doc_parsing_paddleocr.py
β βββ bank_statement.png
β βββ handwritten.jpg
β βββ article.jpg
β
βββ L6/ # Lesson 6: Layout Analysis
β βββ L6.ipynb
β βββ architecture.png
β βββ report_layout.png
β βββ layoutreader/ # LayoutReader implementation
β βββ README.md
β βββ main.py
β βββ tools.py
β
βββ L8/ # Lesson 8: Agentic Extraction
β βββ L8.ipynb
β βββ helper.py
β βββ difficult_examples/ # Complex document samples
β βββ utility_example/
β
βββ L9/ # Lesson 9: Batch Processing
β βββ L9.ipynb
β βββ helper.py
β βββ input_folder/ # Documents to process
β βββ results/ # Markdown outputs
β βββ results_extracted/ # Structured extractions
β
βββ L11/ # Lesson 11: RAG Pipeline
β βββ L11.ipynb
β βββ helper.py
β βββ apple_10k.pdf # Sample financial document
β βββ ade_outputs/
β βββ chroma_db/ # Vector database
β
βββ rag_pipeline_aws/ # Lab 6: AWS RAG System
βββ Lab-6.ipynb
βββ README.md # Detailed lab guide
βββ ade_s3_handler.py # Lambda function
βββ lambda_helpers.py # Deployment tools
βββ visual_grounding_helper.py # Chunk extraction
βββ medical/ # Sample medical PDFs
-
Tesseract
- Official Documentation
- Technical Report
- Best for: Clean, printed text
-
PaddleOCR
- GitHub Repository
- Technical Report
- Best for: Complex layouts, multilingual, handwriting
-
LayoutLM
- Technical Report
- Hugging Face Models
- Capabilities: Document understanding with visual + text + layout
-
LayoutReader
- Technical Report
- Capabilities: Reading order prediction
- S3 - Documentation
- Lambda - Documentation
- IAM - Documentation
- Bedrock - Documentation
- boto3 - AWS SDK
- bedrock-agentcore - Documentation
- strands-agents - Guide
Error: TesseractNotFoundError
Solution:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install tesseract-ocr
# Verify installation
tesseract --version
# If needed, specify path
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'Error: No module named 'paddle'
Solution:
# Uninstall any existing version
pip uninstall paddlepaddle paddlepaddle-gpu
# Install correct version
pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
# For GPU (CUDA 11.2)
pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu112/Error: Authentication failed
Solution:
# Verify .env file exists
cat .env | grep VISION_AGENT_API_KEY
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
# Or set directly (not recommended for production)
import os
os.environ['VISION_AGENT_API_KEY'] = 'your_key_here'Error: Task timed out after 3.00 seconds
Solution:
# Increase Lambda timeout
lambda_client.update_function_configuration(
FunctionName='doc-processor',
Timeout=900, # 15 minutes
MemorySize=1024 # 1GB RAM
)We welcome contributions! Here's how you can help:
- π Report Bugs - Use GitHub Issues
- β¨ Suggest Features - Propose new lessons or examples
- π Improve Documentation - Fix typos, add clarifications
- π» Submit Code - Fork, create feature branch, submit PR
# Fork and clone
git clone https://github.com/YOUR_USERNAME/document_ai_from_OCR_to_agentic_doc_extraction.git
cd document_ai_from_OCR_to_agentic_doc_extraction
# Create branch
git checkout -b feature/your-feature-name
# Make changes and commit
git commit -m "Add: Brief description of changes"
# Push and create PR
git push origin feature/your-feature-nameThis project is licensed under the MIT License.
- DeepLearning.AI for the course structure
- LandingAI for the ADE platform and Vision Agent
- AWS for cloud infrastructure support
- PaddlePaddle team for PaddleOCR
- Microsoft for LayoutLM research
- Google for Tesseract OCR
Last Updated: February 2026