DeepSeek-OCR Inference Scripts

Professional, production-ready Python scripts for running DeepSeek-OCR inference. This repository provides both single GPU and multi-GPU inference options to suit different hardware configurations and use cases.

Repository: https://github.com/connectaman/deepseek-ocr-multigpu-infer

Scripts Available

1. Single GPU Inference (`deepseek_ocr_inference.py`)

🎯 Single GPU: Optimized for single GPU setups
⚡ Fast Setup: Quick model loading and processing
🔧 Model Presets: Built-in presets for different model sizes
📝 Crop Mode: Optional crop mode for better performance
🔄 Multi-Process: Support for 1-2 processes per GPU for maximum utilization

2. Multi-GPU Inference (`deepseek_ocr_multigpu_inference.py`)

🚀 Multi-GPU Support: Automatically detects and utilizes all available CUDA GPUs
📁 Parallel Processing: Processes entire folders of images in parallel
⚖️ Load Balancing: Efficiently distributes work across GPUs
📊 Scalable: Scales with your hardware
🔄 Multi-Process: Support for 1-2 processes per GPU for maximum utilization

Common Features

📁 Batch Processing: Processes entire folders of images
🔧 Configurable: Customizable prompts, image sizes, and processing parameters
📊 Progress Tracking: Real-time logging and progress monitoring
📈 Results Export: Excel export of processing results and statistics
🛡️ Error Handling: Robust error handling with detailed logging
📝 Professional Logging: Clean, informative logging without experimental metrics

Requirements

Python 3.8+
CUDA-compatible GPU(s)
NVIDIA drivers and CUDA toolkit

🖥️ GPU Requirements

Minimum Requirements

GPU Memory: 8GB VRAM minimum
CUDA Compute Capability: 7.0+ (RTX 20 series or newer)
CUDA Version: 11.8 or higher
Driver Version: 525.60.13 or newer

Recommended Configurations

Single GPU Setups

RTX 4090 (24GB) - Best for single GPU multi-process
RTX 4080 (16GB) - Good for single GPU single process
RTX 4070 (12GB) - Minimum for single GPU multi-process
A100 (40GB) - Enterprise single GPU setups

Multi-GPU Setups

2x RTX 4090 (24GB each) - Maximum performance
2x RTX 4080 (16GB each) - High performance
4x RTX 4070 (12GB each) - Cost-effective multi-GPU

AWS Instance Recommendations

Tested AWS Instances

Instance Type	GPU	VRAM	Use Case	Performance
g5.xlarge	1x NVIDIA A10G	24GB	Single GPU testing	1x baseline
g5.12xlarge	4x NVIDIA A10G	24GB each	Multi-GPU production	3.5-4x speedup

AWS Instance Details

g5.xlarge

GPU: 1x NVIDIA A10G
VRAM: 24GB
vCPUs: 4
Memory: 16GB RAM
Best For: Single GPU testing, development, small batch processing
Approximate Cost: ~$1.1/hour

g5.12xlarge

GPU: 4x NVIDIA A10G
VRAM: 24GB per GPU (96GB total)
vCPUs: 48
Memory: 192GB RAM
Best For: Multi-GPU production, large batch processing, maximum throughput
Approximate Cost: ~$5.6/hour

Performance Benchmarking

Test Environment

AWS Instance: g5.xlarge and g5.12xlarge
Model: deepseek-ai/DeepSeek-OCR
Image Size: 1024x1024
Batch Size: 100 images
Test Images: Mixed document types (PDFs, screenshots, handwritten notes)

Benchmark Results

Approach	Instance	GPUs	Processes	Images/min	Speedup
Single GPU - Single Process	g5.xlarge	1	1	12-15	1x
Single GPU - Multi Process	g5.xlarge	1	2	18-22	1.5x
Multi-GPU - Single Process	g5.12xlarge	4	4	45-55	3.5x
Multi-GPU - Multi Process	g5.12xlarge	4	8	65-80	5x

Memory Usage Patterns

Configuration	GPU Memory Usage	Peak Memory	Notes
Single Process	8-10GB	12GB	Stable memory usage
Multi Process (2x)	6-8GB per process	16GB	Shared model loading
Multi-GPU (4x)	8-10GB per GPU	12GB per GPU	Independent GPU memory

Installation

Clone or download this repository

git clone https://github.com/connectaman/deepseek-ocr-multigpu-infer.git
cd deepseek-ocr-multigpu-infer

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

🚀 Inference Approaches

This repository supports 4 different inference approaches to maximize GPU utilization and processing speed:

1. Single GPU - Single Process

Use Case: Basic single GPU setups, testing, or when you want simple processing
Command: python deepseek_ocr_inference.py input_folder output_folder
Processes: 1 process on 1 GPU
Best For: Simple setups, testing, or when you have limited GPU memory

2. Single GPU - Multi Process

Use Case: Maximum utilization of a single powerful GPU
Command: python deepseek_ocr_inference.py input_folder output_folder --num-processes 2
Processes: 2 processes on 1 GPU
Best For: High-end single GPU setups (RTX 4090, A100, etc.)

3. Multi-GPU - Single Process per GPU

Use Case: Multiple GPUs with standard utilization
Command: python deepseek_ocr_multigpu_inference.py input_folder output_folder
Processes: 1 process per GPU
Best For: Multi-GPU setups with moderate processing needs

4. Multi-GPU - Multi Process per GPU

Use Case: Maximum utilization across multiple GPUs
Command: python deepseek_ocr_multigpu_inference.py input_folder output_folder --num-processes-per-gpu 2
Processes: 2 processes per GPU (e.g., 4 processes on 2 GPUs)
Best For: High-performance multi-GPU setups for maximum throughput

Usage

Single GPU Inference

Basic Usage

python deepseek_ocr_inference.py input_folder output_folder

Advanced Usage

python deepseek_ocr_inference.py ./images ./results \
    --prompt "Convert this document to markdown" \
    --base-size 1024 \
    --image-size 640 \
    --crop-mode \
    --gpu-id 0 \
    --num-processes 2 \
    --results-file my_results.xlsx

Multi-Process Usage (Maximum GPU Utilization)

python deepseek_ocr_inference.py ./images ./results --num-processes 2

Model Size Presets

# Tiny model (fastest, least accurate)
python deepseek_ocr_inference.py input output --base-size 512 --image-size 512 --no-crop-mode

# Small model
python deepseek_ocr_inference.py input output --base-size 640 --image-size 640 --no-crop-mode

# Base model (default)
python deepseek_ocr_inference.py input output --base-size 1024 --image-size 1024 --no-crop-mode

# Large model (most accurate, slowest)
python deepseek_ocr_inference.py input output --base-size 1280 --image-size 1280 --no-crop-mode

# Gundam model (balanced)
python deepseek_ocr_inference.py input output --base-size 1024 --image-size 640 --crop-mode

Multi-GPU Inference

Basic Usage

python deepseek_ocr_multigpu_inference.py input_folder output_folder

Advanced Usage

python deepseek_ocr_multigpu_inference.py ./images ./results \
    --prompt "Convert this document to markdown" \
    --base-size 1024 \
    --image-size 1280 \
    --num-processes-per-gpu 2 \
    --results-file multigpu_results.xlsx

Multi-Process per GPU (Maximum Utilization)

python deepseek_ocr_multigpu_inference.py ./images ./results --num-processes-per-gpu 2

Command Line Arguments

Single GPU Script (`deepseek_ocr_inference.py`)

Argument	Required	Default	Description
`input_folder`	✅	-	Path to folder containing input images
`output_folder`	✅	-	Path to folder for output markdown files
`--prompt`	❌	`"\n<	grounding
`--base-size`	❌	`1024`	Base size parameter for model
`--image-size`	❌	`640`	Image size parameter for model
`--crop-mode`	❌	`False`	Enable crop mode for processing
`--gpu-id`	❌	`0`	GPU device ID to use
`--num-processes`	❌	`1`	Number of processes on GPU (1-2)
`--results-file`	❌	`single_gpu_inference_results.xlsx`	Excel file for processing results

Multi-GPU Script (`deepseek_ocr_multigpu_inference.py`)

Argument	Required	Default	Description
`input_folder`	✅	-	Path to folder containing input images
`output_folder`	✅	-	Path to folder for output markdown files
`--prompt`	❌	`"\n<	grounding
`--base-size`	❌	`1024`	Base size parameter for model
`--image-size`	❌	`1280`	Image size parameter for model
`--num-processes-per-gpu`	❌	`1`	Number of processes per GPU (1-2)
`--results-file`	❌	`multigpu_inference_results.xlsx`	Excel file for processing results

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
BMP (.bmp)
TIFF (.tiff, .tif)
WebP (.webp)

GPU Monitoring

Install GPU Monitoring Tool

pip install nvitop

Monitor GPU Usage

nvitop

This will show real-time GPU utilization, memory usage, and temperature for all available GPUs.

GPU Monitoring Screenshot

Example of nvitop showing GPU utilization during DeepSeek-OCR inference across multiple GPUs

Example Workflow

Single GPU Workflow

Prepare your images

mkdir input_images
# Copy your images to input_images/

Run single GPU inference

python deepseek_ocr_inference.py input_images output_markdowns

Monitor progress
- Watch the console output for real-time progress
- Use nvitop in another terminal to monitor GPU usage
Check results
- Markdown files will be saved in output_markdowns/
- Processing results will be saved in single_gpu_inference_results.xlsx

Multi-GPU Workflow

Prepare your images

mkdir input_images
# Copy your images to input_images/

Run multi-GPU inference

python deepseek_ocr_multigpu_inference.py input_images output_markdowns

Monitor progress
- Watch the console output for real-time progress
- Use nvitop in another terminal to monitor GPU usage across all GPUs
Check results
- Markdown files will be saved in output_markdowns/
- Processing results will be saved in multigpu_inference_results.xlsx

Output Structure

Markdown Files

Each input image generates a corresponding markdown file:

input_images/
├── document1.jpg
├── document2.png
└── document3.tiff

output_markdowns/
├── document1.md
├── document2.md
└── document3.md

Results Excel File

The Excel file contains processing metadata:

filename: Original image filename
markdown_filename: Generated markdown filename
gpu_id: GPU that processed the image
gpu_name: Name of the GPU used
status: Processing status (success/error)
error: Error message (if applicable)

📊 Performance Comparison

Processing Speed (AWS Tested)

Approach	Instance	GPUs	Processes	Images/min	Speedup	Cost/hr
Single GPU - Single Process	g5.xlarge	1	1	12-15	1x	$1.00
Single GPU - Multi Process	g5.xlarge	1	2	18-22	1.5x	$1.00
Multi-GPU - Single Process	g5.12xlarge	4	4	45-55	3.5x	$3.00
Multi-GPU - Multi Process	g5.12xlarge	4	8	65-80	5x	$3.00

Local Hardware (Estimated)

Approach	GPU Setup	Processes	Use Case	Speed
Single GPU - Single Process	1 GPU	1	Basic processing	1x
Single GPU - Multi Process	1 GPU	2	High-end single GPU	1.5-1.8x
Multi-GPU - Single Process	2+ GPUs	1 per GPU	Standard multi-GPU	2x (2 GPUs)
Multi-GPU - Multi Process	2+ GPUs	2 per GPU	Maximum throughput	3-3.5x (2 GPUs)

Memory Requirements

Single Process: ~8-12GB GPU memory per process
Multi Process: ~6-8GB GPU memory per process (due to shared model loading)
Recommended: RTX 4090 (24GB) or A100 (40GB) for multi-process setups

When to Use Each Approach

Single GPU - Single Process: Testing, development, or limited GPU memory
Single GPU - Multi Process: High-end single GPU with plenty of memory
Multi-GPU - Single Process: Multiple GPUs with standard processing needs
Multi-GPU - Multi Process: Production environments requiring maximum throughput

Performance Tips

Single GPU Optimization

Model Size: Choose appropriate model size based on your accuracy vs speed requirements
Crop Mode: Enable crop mode for better performance on smaller images
GPU Selection: Use --gpu-id to select the most powerful GPU if you have multiple
Memory Management: Monitor GPU memory usage with nvitop

Multi-GPU Optimization

Load Balancing: The script automatically distributes work evenly across GPUs
GPU Memory: Ensure sufficient GPU memory on all GPUs for your batch size
Image Size: Larger images require more memory but may provide better OCR results
Monitoring: Use nvitop to monitor GPU utilization across all GPUs

General Tips

Batch Processing: Process images in batches to optimize memory usage
Image Formats: Use compressed formats (JPEG) for faster loading
Storage: Use SSD storage for faster image loading

Troubleshooting

Common Issues

CUDA Out of Memory
- Reduce --image-size parameter
- Process fewer images simultaneously
- Check available GPU memory with nvidia-smi
No Images Found
- Verify input folder path
- Check supported image formats
- Ensure images are not in subdirectories
Model Loading Errors
- Verify internet connection for model download
- Check CUDA installation
- Ensure sufficient disk space for model cache

Debug Mode

For detailed debugging, you can modify the logging level in the script:

logging.basicConfig(level=logging.DEBUG, ...)

License

MIT License - see LICENSE file for details.

👨‍💻 Author

Aman Ulla

📫 Contact: connectamanulla@gmail.com
🌐 Portfolio: amanulla.in
🔗 GitHub • LinkedIn • Twitter

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Check the troubleshooting section above
Review the console output for error messages
Open an issue on the repository

Note: This script requires CUDA-compatible GPUs and the DeepSeek-OCR model. Make sure your system meets the hardware requirements before running.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
screenshot		screenshot
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deepseek_ocr_inference.py		deepseek_ocr_inference.py
deepseek_ocr_multigpu_inference.py		deepseek_ocr_multigpu_inference.py
example_usage.py		example_usage.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

DeepSeek-OCR Inference Scripts

Scripts Available

1. Single GPU Inference (deepseek_ocr_inference.py)

2. Multi-GPU Inference (deepseek_ocr_multigpu_inference.py)

Common Features

Requirements

🖥️ GPU Requirements

Minimum Requirements

Recommended Configurations

Single GPU Setups

Multi-GPU Setups

AWS Instance Recommendations

Tested AWS Instances

AWS Instance Details

Performance Benchmarking

Test Environment

Benchmark Results

Memory Usage Patterns

Installation

🚀 Inference Approaches

1. Single GPU - Single Process

2. Single GPU - Multi Process

3. Multi-GPU - Single Process per GPU

4. Multi-GPU - Multi Process per GPU

Usage

Single GPU Inference

Basic Usage

Advanced Usage

Multi-Process Usage (Maximum GPU Utilization)

Model Size Presets

Multi-GPU Inference

Basic Usage

Advanced Usage

Multi-Process per GPU (Maximum Utilization)

Command Line Arguments

Single GPU Script (deepseek_ocr_inference.py)

Multi-GPU Script (deepseek_ocr_multigpu_inference.py)

Supported Image Formats

GPU Monitoring

Install GPU Monitoring Tool

Monitor GPU Usage

GPU Monitoring Screenshot

Example Workflow

Single GPU Workflow

Multi-GPU Workflow

Output Structure

Markdown Files

Results Excel File

📊 Performance Comparison

Processing Speed (AWS Tested)

Local Hardware (Estimated)

Memory Requirements

When to Use Each Approach

Performance Tips

Single GPU Optimization

Multi-GPU Optimization

General Tips

Troubleshooting

Common Issues

Debug Mode

License

👨‍💻 Author

Contributing

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

1. Single GPU Inference (`deepseek_ocr_inference.py`)

2. Multi-GPU Inference (`deepseek_ocr_multigpu_inference.py`)

Single GPU Script (`deepseek_ocr_inference.py`)

Multi-GPU Script (`deepseek_ocr_multigpu_inference.py`)

Packages