📄 Document AI: From OCR to Agentic Document Extraction

A comprehensive learning journey through modern document intelligence techniques, from traditional OCR to advanced agentic extraction systems

📑 Table of Contents

Overview
Key Features
Architecture
Course Structure
Prerequisites
Installation
Quick Start
Lessons Overview
RAG Pipeline with AWS
Helper Utilities
Project Structure
Technical Resources
Troubleshooting
Contributing
License

🎯 Overview

This repository contains a comprehensive course on Document AI and Intelligent Document Processing (IDP), covering the complete evolution from traditional OCR to modern agentic document extraction systems. Learn how to build production-ready document understanding pipelines using cutting-edge technologies.

What You'll Learn

🔍 Traditional OCR: Understanding Tesseract and foundational OCR techniques
🧠 Deep Learning OCR: PaddleOCR and neural network-based text recognition
📐 Layout Analysis: LayoutLM and LayoutReader for document structure understanding
🤖 Agentic Extraction: LandingAI's ADE (Agentic Document Extraction)
☁️ Cloud Deployment: Building RAG pipelines with AWS Bedrock and Lambda
💬 Conversational AI: Creating document-based chatbots with Strands Agents

✨ Key Features

Feature	Description
Multi-Modal Processing	Handle PDFs, images, tables, and complex layouts
Visual Grounding	Maintain bounding box information for precise chunk extraction
Production-Ready	AWS Lambda integration for scalable document processing
RAG Pipeline	Complete Retrieval-Augmented Generation system
Interactive Learning	Jupyter notebooks with hands-on examples
Real-World Use Cases	Medical documents, invoices, receipts, forms, and more

🏗️ Architecture

Overall System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Document AI Pipeline                          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────┐
        │      Input Document Processing          │
        │  (PDF, Images, Scanned Documents)       │
        └─────────────────────────────────────────┘
                              │
        ┌─────────────────────┴─────────────────────┐
        │                                           │
        ▼                                           ▼
┌──────────────────┐                    ┌──────────────────┐
│  Traditional OCR │                    │   Deep Learning  │
│    (Tesseract)   │                    │   OCR (Paddle)   │
└──────────────────┘                    └──────────────────┘
        │                                           │
        └─────────────────────┬─────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────┐
        │       Layout Understanding              │
        │    (LayoutLM, LayoutReader)            │
        └─────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────┐
        │      Agentic Document Extraction        │
        │           (LandingAI ADE)               │
        └─────────────────────────────────────────┘
                              │
        ┌─────────────────────┴─────────────────────┐
        │                                           │
        ▼                                           ▼
┌──────────────────┐                    ┌──────────────────┐
│   Structured     │                    │   RAG Pipeline   │
│     Output       │                    │   (AWS Bedrock)  │
│  (Markdown, JSON)│                    │                  │
└──────────────────┘                    └──────────────────┘
                                                  │
                                                  ▼
                                        ┌──────────────────┐
                                        │   Chatbot Agent  │
                                        │ (Strands Agents) │
                                        └──────────────────┘

AWS RAG Pipeline Architecture

┌─────────────┐         ┌──────────────┐         ┌─────────────────┐
│   S3 Bucket │────────▶│   Lambda     │────────▶│   LandingAI     │
│ (PDF Upload)│         │  Function    │         │      ADE        │
└─────────────┘         └──────────────┘         └─────────────────┘
                                                           │
                                                           │ Process
                                                           ▼
                                                  ┌─────────────────┐
                                                  │  Extract Chunks │
                                                  │  + Grounding    │
                                                  │  + Metadata     │
                                                  └─────────────────┘
                                                           │
                        ┌──────────────────────────────────┤
                        │                                  │
                        ▼                                  ▼
              ┌─────────────────┐              ┌─────────────────┐
              │ Markdown Output │              │  Chunk JSONs    │
              │  (S3 Storage)   │              │ + Bounding Boxes│
              └─────────────────┘              └─────────────────┘
                                                           │
                                                           ▼
                                                  ┌─────────────────┐
                                                  │     Bedrock     │
                                                  │ Knowledge Base  │
                                                  │   (Vector DB)   │
                                                  └─────────────────┘
                                                           │
                                                           ▼
                                                  ┌─────────────────┐
                                                  │ Strands Agents  │
                                                  │    Chatbot      │
                                                  │ + Visual Ground │
                                                  └─────────────────┘

Introduction

This repository contains a course or tutorial series focused on Document AI , progressing from basic OCR (Optical Character Recognition) to advanced Agentic Document Extraction (ADE) and RAG (Retrieval-Augmented Generation) pipelines.

Course Structure

The course is organized into several "Labs," each contained within its own subdirectory. The directory naming convention (L2, L4, etc.) suggests a step-by-step progression, though the internal Lab numbering differs slightly.

Directory	Notebook File	Lab Title	Key Topics
L2	L2.ipynb	Lab 1: Document Processing with OCR	• Basic OCR with Tesseract `• Parsing sample documents`• Using Regex for extraction `• Building a simple OCR agent`• Limitations of basic OCR
L4	L4.ipynb	Lab 2: Document Processing with PaddleOCR	• Advanced OCR with PaddleOCR (Deep Learning based)`• Text detection vs. recognition`• Layout detection``• Handling tables and handwriting
L6	L6.ipynb	Lab 3: Building Agentic Document Understanding	• LayoutReader for reading order `• Vision-Language Model (VLM) for charts/tables`• Building custom tools (`AnalyzeChart`, `AnalyzeTable`)``• Assembling a LangChain agent
L8	L8.ipynb	Lab 4: Agentic Document Extraction (Part I)	• Introduction to LandingAI's ADE framework `• Vision-First, Data-Centric, Agentic approach`• Extracting Key-Value pairs``• Handling "difficult" documents (charts, handwritten forms)
L9	L9.ipynb	Lab 4: Agentic Document Extraction (Part II)	• Processing multiple document types `• Document categorization schemas`• Validation logic for extractions``• Building a full processing pipeline
L11	L11.ipynb	Lab 5: Agentic Document Extraction for RAG	• RAG (Retrieval-Augmented Generation) with documents `• Preprocessing, Retrieval, Generation phases`• Vector database setup (ChromaDB)``• Visual grounding in RAG

Key Components

Notebooks (.ipynb) : The core interactive lessons containing code, explanations, and exercises.

helper.py** : A shared utility file present in multiple directories (and root), likely containing helper functions for image processing, visualization, or API interaction.

rag_pipeline_aws/ : A separate directory likely containing a more production-oriented or cloud-based RAG implementation (based on the name).
Images/Assets : Each Lab directory contains sample images (invoice.png,receipt.jpg,apple_10k.pdf) used for testing the document processing pipelines.

Getting Started

To begin, it is recommended to start with Lab 1 (

L2/L2.ipynb) to understand the basics of OCR and agent construction before moving to more advanced topics like PaddleOCR and Agentic Extraction.

📚 Course Structure

The course is organized into progressive lessons, each building upon previous concepts:

Lesson	Topic	Technologies	Difficulty
L1	Introduction to OCR	Tesseract	⭐ Beginner
L2	Document Processing	Tesseract, PaddleOCR	⭐⭐ Beginner
L3	Layout Analysis	LayoutLM	⭐⭐ Intermediate
L4	Advanced OCR	PaddleOCR	⭐⭐ Intermediate
L6	Reading Order	LayoutReader	⭐⭐⭐ Intermediate
L8	Agentic Extraction	LandingAI ADE	⭐⭐⭐ Advanced
L9	Batch Processing	LandingAI ADE	⭐⭐⭐ Advanced
L11	RAG with ChromaDB	ChromaDB, LangChain	⭐⭐⭐⭐ Advanced
Lab 6	AWS RAG Pipeline	AWS Bedrock, Lambda, Strands	⭐⭐⭐⭐⭐ Expert

🔧 Prerequisites

System Requirements

Python: Version 3.10 (recommended)
OS: Linux, macOS, or Windows (Linux x86_64 recommended for AWS Lambda)
Memory: 8GB RAM minimum (16GB recommended)
Storage: 5GB free space

Required Accounts

LandingAI Account (Free tier available)
- Sign up at LandingAI
- Get your Vision Agent API key
AWS Account (for Lab 6 only)
- Required services: S3, Lambda, Bedrock, IAM
- Estimated cost: ~$5-10/month for testing
OpenAI Account (optional, for advanced features)

📦 Installation

Step 1: Clone the Repository

git clone https://github.com/Ahmed-El-Zainy/Document-AI-From-OCR-to-Agentic-Doc-Extraction.git
cd document_ai_from_OCR_to_agentic_doc_extraction

Step 2: Create Virtual Environment

# Using venv
python3.10.6 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Or using conda
conda create -n docai python=3.10.6
conda activate docai

Step 3: Install Dependencies

# Install core dependencies
pip install -r requirements.txt

# For AWS Lab 6 (optional)
pip install boto3 bedrock-agentcore strands-agents

Step 4: Install System Dependencies

For Tesseract OCR (L1, L2):

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Windows
# Download from: https://github.com/UB-Mannheim/tesseract/wiki

For PaddleOCR (L2, L4):

pip install paddlepaddle==3.0.0 paddleocr

Step 5: Configure Environment Variables

Create a .env file in the project root:

# Mosted Used in notebooks
####  LandingAI Configuration
VISION_AGENT_API_KEY= your_landingai_api_key_here
HF_TOKEN = your_huggingface_api_key # chat models & emebedding models
GROQ_API_KEY = your_groq_api_key # vlm models


# AWS Configuration (for Lab 6)
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-west-2
S3_BUCKET=your-bucket-name
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
BEDROCK_KB_ID=your_knowledge_base_id

🚀 Quick Start

Example 1: Basic OCR with Tesseract (Lesson 2)

import pytesseract
from PIL import Image

# Load and process image
image = Image.open("L2/invoice.png")
text = pytesseract.image_to_string(image)
print(text)

Example 2: Advanced OCR with PaddleOCR (Lesson 4)

from paddleocr import PaddleOCR

# Initialize OCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Process document
result = ocr.ocr('L4/bank_statement.png', cls=True)

# Extract text
for line in result:
    for word_info in line:
        print(word_info[1][0])  # Extracted text

Example 3: Agentic Document Extraction (Lesson 8)

from landingai.ade import ADEClient

# Initialize ADE client
client = ADEClient(api_key="your_api_key")

# Parse document with visual grounding
response = client.parse(
    document_path="document.pdf",
    extract_tables=True,
    extract_figures=True
)

# Access structured output
markdown_content = response.markdown
groundings = response.grounding  # Bounding box information

print(markdown_content)

Example 4: RAG Pipeline Query (Lesson 11)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Load vector database
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=OpenAIEmbeddings()
)

# Query documents
results = vectorstore.similarity_search(
    "What are the company's revenue figures?",
    k=5
)

for doc in results:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}\n")

📖 Lessons Overview

Lesson 2: Introduction to Document Processing

Focus: Traditional OCR with Tesseract and PaddleOCR

📁 Directory: L2/

L2.ipynb - Main tutorial notebook
invoice.png, receipt.jpg, table.png - Sample documents

Key Concepts:

Image preprocessing techniques
Text extraction from various document types
Handling tables and forms
Comparing Tesseract vs. PaddleOCR performance

Use Cases:

✅ Simple invoices and receipts
✅ Clean, scanned documents
⚠️ Limited table structure recognition
❌ Complex layouts not well supported

Lesson 4: Advanced OCR with PaddleOCR

Focus: Deep learning-based OCR with better accuracy

📁 Directory: L4/

L4.ipynb - Advanced OCR techniques
bank_statement.png, handwritten.jpg - Complex documents

Key Concepts:

Neural network-based text detection
Multi-language support
Angle classification for rotated text
Confidence scoring

Improvements over L2:

✅ Better handling of curved or rotated text
✅ Improved accuracy on low-quality scans
✅ Multi-language text recognition
✅ Handwriting recognition support

Lesson 6: Layout Analysis with LayoutReader

Focus: Understanding document structure and reading order

📁 Directory: L6/

L6.ipynb - Layout understanding tutorial
layoutreader/ - LayoutReader implementation
architecture.png, report_layout.png - Visualization examples

Key Concepts:

Document layout analysis
Reading order determination
Relationship between text blocks
Visual structure recognition

Architecture:

Document Image
      │
      ▼
┌──────────────┐
│ Layout Model │  (LayoutLM/LayoutLMv2)
└──────────────┘
      │
      ▼
┌──────────────┐
│   Bounding   │  (Text blocks, tables, figures)
│    Boxes     │
└──────────────┘
      │
      ▼
┌──────────────┐
│ Reading Order│  (Sequence prediction)
│ Determination│
└──────────────┘
      │
      ▼
Structured Output

Lesson 8: Agentic Document Extraction

Focus: Modern AI-powered document understanding with LandingAI ADE

📁 Directory: L8/

L8.ipynb - ADE comprehensive tutorial
helper.py - Visualization utilities
utility_example/ - Advanced examples
difficult_examples/ - Edge cases

Key Concepts:

Agentic approach to document extraction
Automatic chunk detection (text, tables, figures)
Visual grounding with bounding boxes
Markdown output with preserved structure
Confidence scoring for extractions

Chunk Types:

📝 chunkText - Regular text paragraphs
📊 chunkTable - Structured tables
🖼️ chunkFigure - Images and diagrams
🏷️ chunkLogo - Company logos
📇 chunkCard - Business cards
✍️ chunkAttestation - Signatures
📱 chunkScanCode - QR/Barcodes
📋 chunkForm - Form fields

Visualization Example:

from helper import draw_bounding_boxes

# Parse document
response = ade_client.parse("document.pdf")

# Draw bounding boxes on chunks
draw_bounding_boxes(response, "document.pdf")

Output: Color-coded bounding boxes showing:

🟢 Green: Text chunks
🔵 Blue: Tables
🟣 Purple: Marginalia
🟠 Orange: Cards

Lesson 9: Batch Processing with ADE

Focus: Processing multiple documents efficiently

📁 Directory: L9/

L9.ipynb - Batch processing workflow
input_folder/ - Sample documents for batch processing
results/ - Processed outputs
results_extracted/ - Extracted structured data

Key Concepts:

Batch document processing
Parallel processing strategies
Error handling and logging
Output organization
Performance optimization

Workflow:

import os
from pathlib import Path
from landingai.ade import ADEClient

client = ADEClient(api_key=os.getenv("VISION_AGENT_API_KEY"))
input_dir = Path("input_folder")
output_dir = Path("results")

for doc_path in input_dir.glob("*.pdf"):
    try:
        response = client.parse(doc_path)
  
        # Save markdown
        (output_dir / f"{doc_path.stem}.md").write_text(response.markdown)
  
        # Save grounding data
        # ... (save JSON with bounding boxes)
  
        print(f"✅ Processed: {doc_path.name}")
    except Exception as e:
        print(f"❌ Failed: {doc_path.name} - {e}")

Lesson 11: RAG with ChromaDB

Focus: Building a Retrieval-Augmented Generation system

📁 Directory: L11/

L11.ipynb - RAG implementation tutorial
apple_10k.pdf - Sample financial document
chroma_db/ - Vector database storage
ade_outputs/ - Processed document chunks

Key Concepts:

Document chunking strategies
Vector embeddings
Semantic search
Context retrieval
LLM integration for Q&A

RAG Pipeline Flow:

Document (PDF)
      │
      ▼
  ADE Parse  ──────> Markdown + Grounding
      │
      ▼
  Chunking   ──────> Semantic segments
      │
      ▼
  Embeddings ──────> Vector representations
      │
      ▼
  ChromaDB   ──────> Vector storage
      │
      ▼
  User Query
      │
      ▼
 Similarity  ──────> Retrieve relevant chunks
   Search
      │
      ▼
  LLM + Context ───> Generate answer

Example Query Flow:

# 1. Load and parse document
response = ade_client.parse("apple_10k.pdf")

# 2. Create chunks
chunks = create_semantic_chunks(response.markdown)

# 3. Store in vector DB
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

# 4. Query
query = "What was Apple's revenue in 2023?"
docs = vectorstore.similarity_search(query, k=5)

# 5. Generate answer with LLM
context = "\n\n".join([doc.page_content for doc in docs])
answer = llm.invoke(f"Context: {context}\n\nQuestion: {query}")

☁️ RAG Pipeline with AWS

Overview

Lab 6 demonstrates a production-ready document intelligence system using AWS services.

📁 Directory: rag_pipeline_aws/

Architecture Components

┌─────────────────────────────────────────────────────────────────┐
│                    AWS Cloud Infrastructure                      │
└─────────────────────────────────────────────────────────────────┘

┌──────────────┐
│  User Upload │
│  PDF to S3   │
└──────┬───────┘
       │
       ▼
┌─────────────────────────────────────────────────────┐
│              S3 Bucket Structure                    │
├─────────────────────────────────────────────────────┤
│  input/                                             │
│    └── medical/                                     │
│         └── research_papers.pdf                     │
│                                                     │
│  output/                                            │
│    ├── medical/                  (Markdown)         │
│    ├── medical_grounding/        (Bounding boxes)   │
│    ├── medical_chunks/           (Chunk JSONs)      │
│    └── medical_chunk_images/     (Cropped images)   │
└─────────────────────────────────────────────────────┘
       │
       │ S3 Event Trigger
       ▼
┌──────────────────────────────────────────┐
│           Lambda Function                │
│      (ade_s3_handler.py)                 │
├──────────────────────────────────────────┤
│  • Triggered on S3 upload                │
│  • Calls LandingAI ADE API               │
│  • Processes document                    │
│  • Creates chunk JSONs                   │
│  • Saves to S3 output/                   │
└──────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────┐
│       AWS Bedrock Knowledge Base         │
├──────────────────────────────────────────┤
│  • Indexes chunk JSONs                   │
│  • Maintains metadata                    │
│  • Vector embeddings                     │
│  • Semantic search                       │
└──────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────┐
│         Strands Agent Framework          │
├──────────────────────────────────────────┤
│  • Orchestrates conversation             │
│  • Queries Knowledge Base                │
│  • Visual grounding tool                 │
│  • Bedrock Memory Service                │
└──────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────┐
│          User Interaction                │
│  • Ask questions about documents         │
│  • Get answers with source citations     │
│  • View highlighted document regions     │
└──────────────────────────────────────────┘

Key Files

Lab-6.ipynb - Main tutorial notebook
ade_s3_handler.py - Lambda function for document processing
lambda_helpers.py - Deployment utilities
visual_grounding_helper.py - Chunk image extraction
medical/ - Sample medical research papers

Quick Setup

# 1. Configure AWS credentials
aws configure

# 2. Create S3 bucket
aws s3 mb s3://your-doc-bucket
aws s3api put-object --bucket your-doc-bucket --key input/
aws s3api put-object --bucket your-doc-bucket --key output/

# 3. Deploy Lambda (see Lab-6.ipynb for details)
# 4. Create Bedrock Knowledge Base
# 5. Upload documents and start chatting!

For detailed setup instructions, see rag_pipeline_aws/README.md

🛠️ Helper Utilities

The helper.py file provides essential utilities for document visualization and processing.

Key Functions

1. Document Display

from helper import print_document

# Display PDF or image in notebook
print_document("document.pdf")
print_document("image.png")

2. Bounding Box Visualization

from helper import draw_bounding_boxes

# Draw color-coded bounding boxes
parse_response = ade_client.parse("document.pdf")
annotated_image = draw_bounding_boxes(parse_response, "document.pdf")

Color Scheme:

🟢 Green (40, 167, 69): Text chunks (chunkText)
🔵 Blue (0, 123, 255): Tables (chunkTable)
🟣 Purple (111, 66, 193): Marginalia (chunkMarginalia)
🟡 Magenta (255, 0, 255): Figures (chunkFigure)
🟢 Light Green (144, 238, 144): Logos (chunkLogo)
🟠 Orange (255, 165, 0): Cards (chunkCard)
🔵 Cyan (0, 255, 255): Attestations (chunkAttestation)
🟡 Yellow (255, 193, 7): Scan codes (chunkScanCode)
🔴 Red (220, 20, 60): Forms (chunkForm)

📂 Project Structure

document_ai_from_OCR_to_agentic_doc_extraction/
│
├── README.md                      # This comprehensive guide
├── requirements.txt               # Python dependencies
├── helper.py                      # Global utility functions
├── .env                          # Environment variables (not in git)
├── .gitignore                    # Git ignore rules
│
├── L2/                           # Lesson 2: Basic OCR
│   ├── L2.ipynb                  # Jupyter notebook
│   ├── l2_doc_processing.py      # Python utilities
│   ├── invoice.png               # Sample invoice
│   ├── receipt.jpg               # Sample receipt
│   ├── table.png                 # Sample table
│   └── requirements.txt          # Lesson-specific deps
│
├── L4/                           # Lesson 4: PaddleOCR
│   ├── L4.ipynb
│   ├── l4_doc_parsing_paddleocr.py
│   ├── bank_statement.png
│   ├── handwritten.jpg
│   └── article.jpg
│
├── L6/                           # Lesson 6: Layout Analysis
│   ├── L6.ipynb
│   ├── architecture.png
│   ├── report_layout.png
│   └── layoutreader/             # LayoutReader implementation
│       ├── README.md
│       ├── main.py
│       └── tools.py
│
├── L8/                           # Lesson 8: Agentic Extraction
│   ├── L8.ipynb
│   ├── helper.py
│   ├── difficult_examples/       # Complex document samples
│   └── utility_example/
│
├── L9/                           # Lesson 9: Batch Processing
│   ├── L9.ipynb
│   ├── helper.py
│   ├── input_folder/             # Documents to process
│   ├── results/                  # Markdown outputs
│   └── results_extracted/        # Structured extractions
│
├── L11/                          # Lesson 11: RAG Pipeline
│   ├── L11.ipynb
│   ├── helper.py
│   ├── apple_10k.pdf             # Sample financial document
│   ├── ade_outputs/
│   └── chroma_db/                # Vector database
│
└── rag_pipeline_aws/             # Lab 6: AWS RAG System
    ├── Lab-6.ipynb
    ├── README.md                 # Detailed lab guide
    ├── ade_s3_handler.py         # Lambda function
    ├── lambda_helpers.py         # Deployment tools
    ├── visual_grounding_helper.py # Chunk extraction
    └── medical/                   # Sample medical PDFs

📚 Technical Resources

Core Technologies

OCR Engines

Tesseract
- Official Documentation
- Technical Report
- Best for: Clean, printed text
PaddleOCR
- GitHub Repository
- Technical Report
- Best for: Complex layouts, multilingual, handwriting

Layout Understanding

LayoutLM
- Technical Report
- Hugging Face Models
- Capabilities: Document understanding with visual + text + layout
LayoutReader
- Technical Report
- Capabilities: Reading order prediction

LandingAI Platform

Vision Agent

AWS Services

S3 - Documentation
Lambda - Documentation
IAM - Documentation
Bedrock - Documentation

Python Libraries

boto3 - AWS SDK
bedrock-agentcore - Documentation
strands-agents - Guide
- Blog Post

🔍 Troubleshooting

Common Issues

1. Tesseract Not Found

Error: TesseractNotFoundError

Solution:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install tesseract-ocr

# Verify installation
tesseract --version

# If needed, specify path
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'

2. PaddlePaddle Installation Issues

Error: No module named 'paddle'

Solution:

# Uninstall any existing version
pip uninstall paddlepaddle paddlepaddle-gpu

# Install correct version
pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

# For GPU (CUDA 11.2)
pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu112/

3. LandingAI API Key Issues

Error: Authentication failed

Solution:

# Verify .env file exists
cat .env | grep VISION_AGENT_API_KEY

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Or set directly (not recommended for production)
import os
os.environ['VISION_AGENT_API_KEY'] = 'your_key_here'

4. AWS Lambda Timeout

Error: Task timed out after 3.00 seconds

Solution:

# Increase Lambda timeout
lambda_client.update_function_configuration(
    FunctionName='doc-processor',
    Timeout=900,  # 15 minutes
    MemorySize=1024  # 1GB RAM
)

🤝 Contributing

We welcome contributions! Here's how you can help:

Ways to Contribute

🐛 Report Bugs - Use GitHub Issues
✨ Suggest Features - Propose new lessons or examples
📖 Improve Documentation - Fix typos, add clarifications
💻 Submit Code - Fork, create feature branch, submit PR

Development Setup

# Fork and clone
git clone https://github.com/YOUR_USERNAME/document_ai_from_OCR_to_agentic_doc_extraction.git
cd document_ai_from_OCR_to_agentic_doc_extraction

# Create branch
git checkout -b feature/your-feature-name

# Make changes and commit
git commit -m "Add: Brief description of changes"

# Push and create PR
git push origin feature/your-feature-name

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

DeepLearning.AI for the course structure
LandingAI for the ADE platform and Vision Agent
AWS for cloud infrastructure support
PaddlePaddle team for PaddleOCR
Microsoft for LayoutLM research
Google for Tesseract OCR

⬆ Back to Top

Last Updated: February 2026

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
L11		L11
L2		L2
L4		L4
L6		L6
L8		L8
L9		L9
rag_pipeline_aws		rag_pipeline_aws
.gitignore		.gitignore
README.md		README.md
helper.py		helper.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📄 Document AI: From OCR to Agentic Document Extraction

📑 Table of Contents

🎯 Overview

What You'll Learn

✨ Key Features

🏗️ Architecture

Overall System Architecture

AWS RAG Pipeline Architecture

Introduction

Course Structure

Key Components

Getting Started

📚 Course Structure

🔧 Prerequisites

System Requirements

Required Accounts

📦 Installation

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Install System Dependencies

For Tesseract OCR (L1, L2):

For PaddleOCR (L2, L4):

Step 5: Configure Environment Variables

🚀 Quick Start

Example 1: Basic OCR with Tesseract (Lesson 2)

Example 2: Advanced OCR with PaddleOCR (Lesson 4)

Example 3: Agentic Document Extraction (Lesson 8)

Example 4: RAG Pipeline Query (Lesson 11)

📖 Lessons Overview

Lesson 2: Introduction to Document Processing

Lesson 4: Advanced OCR with PaddleOCR

Lesson 6: Layout Analysis with LayoutReader

Lesson 8: Agentic Document Extraction

Lesson 9: Batch Processing with ADE

Lesson 11: RAG with ChromaDB

☁️ RAG Pipeline with AWS

Overview

Architecture Components

Key Files

Quick Setup

🛠️ Helper Utilities

Key Functions

1. Document Display

2. Bounding Box Visualization

📂 Project Structure

📚 Technical Resources

Core Technologies

OCR Engines

Layout Understanding

LandingAI Platform

AWS Services

Python Libraries

🔍 Troubleshooting

Common Issues

1. Tesseract Not Found

2. PaddlePaddle Installation Issues

3. LandingAI API Key Issues

4. AWS Lambda Timeout

🤝 Contributing

Ways to Contribute

Development Setup

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages