Professional, production-ready Python scripts for running DeepSeek-OCR inference. This repository provides both single GPU and multi-GPU inference options to suit different hardware configurations and use cases.
Repository: https://github.com/connectaman/deepseek-ocr-multigpu-infer
- 🎯 Single GPU: Optimized for single GPU setups
- ⚡ Fast Setup: Quick model loading and processing
- 🔧 Model Presets: Built-in presets for different model sizes
- 📝 Crop Mode: Optional crop mode for better performance
- 🔄 Multi-Process: Support for 1-2 processes per GPU for maximum utilization
- 🚀 Multi-GPU Support: Automatically detects and utilizes all available CUDA GPUs
- 📁 Parallel Processing: Processes entire folders of images in parallel
- ⚖️ Load Balancing: Efficiently distributes work across GPUs
- 📊 Scalable: Scales with your hardware
- 🔄 Multi-Process: Support for 1-2 processes per GPU for maximum utilization
- 📁 Batch Processing: Processes entire folders of images
- 🔧 Configurable: Customizable prompts, image sizes, and processing parameters
- 📊 Progress Tracking: Real-time logging and progress monitoring
- 📈 Results Export: Excel export of processing results and statistics
- 🛡️ Error Handling: Robust error handling with detailed logging
- 📝 Professional Logging: Clean, informative logging without experimental metrics
- Python 3.8+
- CUDA-compatible GPU(s)
- NVIDIA drivers and CUDA toolkit
- GPU Memory: 8GB VRAM minimum
- CUDA Compute Capability: 7.0+ (RTX 20 series or newer)
- CUDA Version: 11.8 or higher
- Driver Version: 525.60.13 or newer
- RTX 4090 (24GB) - Best for single GPU multi-process
- RTX 4080 (16GB) - Good for single GPU single process
- RTX 4070 (12GB) - Minimum for single GPU multi-process
- A100 (40GB) - Enterprise single GPU setups
- 2x RTX 4090 (24GB each) - Maximum performance
- 2x RTX 4080 (16GB each) - High performance
- 4x RTX 4070 (12GB each) - Cost-effective multi-GPU
| Instance Type | GPU | VRAM | Use Case | Performance |
|---|---|---|---|---|
| g5.xlarge | 1x NVIDIA A10G | 24GB | Single GPU testing | 1x baseline |
| g5.12xlarge | 4x NVIDIA A10G | 24GB each | Multi-GPU production | 3.5-4x speedup |
g5.xlarge
- GPU: 1x NVIDIA A10G
- VRAM: 24GB
- vCPUs: 4
- Memory: 16GB RAM
- Best For: Single GPU testing, development, small batch processing
- Approximate Cost: ~$1.1/hour
g5.12xlarge
- GPU: 4x NVIDIA A10G
- VRAM: 24GB per GPU (96GB total)
- vCPUs: 48
- Memory: 192GB RAM
- Best For: Multi-GPU production, large batch processing, maximum throughput
- Approximate Cost: ~$5.6/hour
- AWS Instance: g5.xlarge and g5.12xlarge
- Model: deepseek-ai/DeepSeek-OCR
- Image Size: 1024x1024
- Batch Size: 100 images
- Test Images: Mixed document types (PDFs, screenshots, handwritten notes)
| Approach | Instance | GPUs | Processes | Images/min | Speedup |
|---|---|---|---|---|---|
| Single GPU - Single Process | g5.xlarge | 1 | 1 | 12-15 | 1x |
| Single GPU - Multi Process | g5.xlarge | 1 | 2 | 18-22 | 1.5x |
| Multi-GPU - Single Process | g5.12xlarge | 4 | 4 | 45-55 | 3.5x |
| Multi-GPU - Multi Process | g5.12xlarge | 4 | 8 | 65-80 | 5x |
| Configuration | GPU Memory Usage | Peak Memory | Notes |
|---|---|---|---|
| Single Process | 8-10GB | 12GB | Stable memory usage |
| Multi Process (2x) | 6-8GB per process | 16GB | Shared model loading |
| Multi-GPU (4x) | 8-10GB per GPU | 12GB per GPU | Independent GPU memory |
-
Clone or download this repository
git clone https://github.com/connectaman/deepseek-ocr-multigpu-infer.git cd deepseek-ocr-multigpu-infer -
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
This repository supports 4 different inference approaches to maximize GPU utilization and processing speed:
- Use Case: Basic single GPU setups, testing, or when you want simple processing
- Command:
python deepseek_ocr_inference.py input_folder output_folder - Processes: 1 process on 1 GPU
- Best For: Simple setups, testing, or when you have limited GPU memory
- Use Case: Maximum utilization of a single powerful GPU
- Command:
python deepseek_ocr_inference.py input_folder output_folder --num-processes 2 - Processes: 2 processes on 1 GPU
- Best For: High-end single GPU setups (RTX 4090, A100, etc.)
- Use Case: Multiple GPUs with standard utilization
- Command:
python deepseek_ocr_multigpu_inference.py input_folder output_folder - Processes: 1 process per GPU
- Best For: Multi-GPU setups with moderate processing needs
- Use Case: Maximum utilization across multiple GPUs
- Command:
python deepseek_ocr_multigpu_inference.py input_folder output_folder --num-processes-per-gpu 2 - Processes: 2 processes per GPU (e.g., 4 processes on 2 GPUs)
- Best For: High-performance multi-GPU setups for maximum throughput
python deepseek_ocr_inference.py input_folder output_folderpython deepseek_ocr_inference.py ./images ./results \
--prompt "Convert this document to markdown" \
--base-size 1024 \
--image-size 640 \
--crop-mode \
--gpu-id 0 \
--num-processes 2 \
--results-file my_results.xlsxpython deepseek_ocr_inference.py ./images ./results --num-processes 2# Tiny model (fastest, least accurate)
python deepseek_ocr_inference.py input output --base-size 512 --image-size 512 --no-crop-mode
# Small model
python deepseek_ocr_inference.py input output --base-size 640 --image-size 640 --no-crop-mode
# Base model (default)
python deepseek_ocr_inference.py input output --base-size 1024 --image-size 1024 --no-crop-mode
# Large model (most accurate, slowest)
python deepseek_ocr_inference.py input output --base-size 1280 --image-size 1280 --no-crop-mode
# Gundam model (balanced)
python deepseek_ocr_inference.py input output --base-size 1024 --image-size 640 --crop-modepython deepseek_ocr_multigpu_inference.py input_folder output_folderpython deepseek_ocr_multigpu_inference.py ./images ./results \
--prompt "Convert this document to markdown" \
--base-size 1024 \
--image-size 1280 \
--num-processes-per-gpu 2 \
--results-file multigpu_results.xlsxpython deepseek_ocr_multigpu_inference.py ./images ./results --num-processes-per-gpu 2- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp)
- TIFF (.tiff, .tif)
- WebP (.webp)
pip install nvitopnvitopThis will show real-time GPU utilization, memory usage, and temperature for all available GPUs.
Example of nvitop showing GPU utilization during DeepSeek-OCR inference across multiple GPUs
-
Prepare your images
mkdir input_images # Copy your images to input_images/ -
Run single GPU inference
python deepseek_ocr_inference.py input_images output_markdowns
-
Monitor progress
- Watch the console output for real-time progress
- Use
nvitopin another terminal to monitor GPU usage
-
Check results
- Markdown files will be saved in
output_markdowns/ - Processing results will be saved in
single_gpu_inference_results.xlsx
- Markdown files will be saved in
-
Prepare your images
mkdir input_images # Copy your images to input_images/ -
Run multi-GPU inference
python deepseek_ocr_multigpu_inference.py input_images output_markdowns
-
Monitor progress
- Watch the console output for real-time progress
- Use
nvitopin another terminal to monitor GPU usage across all GPUs
-
Check results
- Markdown files will be saved in
output_markdowns/ - Processing results will be saved in
multigpu_inference_results.xlsx
- Markdown files will be saved in
Each input image generates a corresponding markdown file:
input_images/
├── document1.jpg
├── document2.png
└── document3.tiff
output_markdowns/
├── document1.md
├── document2.md
└── document3.md
The Excel file contains processing metadata:
filename: Original image filenamemarkdown_filename: Generated markdown filenamegpu_id: GPU that processed the imagegpu_name: Name of the GPU usedstatus: Processing status (success/error)error: Error message (if applicable)
| Approach | Instance | GPUs | Processes | Images/min | Speedup | Cost/hr |
|---|---|---|---|---|---|---|
| Single GPU - Single Process | g5.xlarge | 1 | 1 | 12-15 | 1x | $1.00 |
| Single GPU - Multi Process | g5.xlarge | 1 | 2 | 18-22 | 1.5x | $1.00 |
| Multi-GPU - Single Process | g5.12xlarge | 4 | 4 | 45-55 | 3.5x | $3.00 |
| Multi-GPU - Multi Process | g5.12xlarge | 4 | 8 | 65-80 | 5x | $3.00 |
| Approach | GPU Setup | Processes | Use Case | Speed |
|---|---|---|---|---|
| Single GPU - Single Process | 1 GPU | 1 | Basic processing | 1x |
| Single GPU - Multi Process | 1 GPU | 2 | High-end single GPU | 1.5-1.8x |
| Multi-GPU - Single Process | 2+ GPUs | 1 per GPU | Standard multi-GPU | 2x (2 GPUs) |
| Multi-GPU - Multi Process | 2+ GPUs | 2 per GPU | Maximum throughput | 3-3.5x (2 GPUs) |
- Single Process: ~8-12GB GPU memory per process
- Multi Process: ~6-8GB GPU memory per process (due to shared model loading)
- Recommended: RTX 4090 (24GB) or A100 (40GB) for multi-process setups
- Single GPU - Single Process: Testing, development, or limited GPU memory
- Single GPU - Multi Process: High-end single GPU with plenty of memory
- Multi-GPU - Single Process: Multiple GPUs with standard processing needs
- Multi-GPU - Multi Process: Production environments requiring maximum throughput
- Model Size: Choose appropriate model size based on your accuracy vs speed requirements
- Crop Mode: Enable crop mode for better performance on smaller images
- GPU Selection: Use
--gpu-idto select the most powerful GPU if you have multiple - Memory Management: Monitor GPU memory usage with
nvitop
- Load Balancing: The script automatically distributes work evenly across GPUs
- GPU Memory: Ensure sufficient GPU memory on all GPUs for your batch size
- Image Size: Larger images require more memory but may provide better OCR results
- Monitoring: Use
nvitopto monitor GPU utilization across all GPUs
- Batch Processing: Process images in batches to optimize memory usage
- Image Formats: Use compressed formats (JPEG) for faster loading
- Storage: Use SSD storage for faster image loading
-
CUDA Out of Memory
- Reduce
--image-sizeparameter - Process fewer images simultaneously
- Check available GPU memory with
nvidia-smi
- Reduce
-
No Images Found
- Verify input folder path
- Check supported image formats
- Ensure images are not in subdirectories
-
Model Loading Errors
- Verify internet connection for model download
- Check CUDA installation
- Ensure sufficient disk space for model cache
For detailed debugging, you can modify the logging level in the script:
logging.basicConfig(level=logging.DEBUG, ...)MIT License - see LICENSE file for details.
Aman Ulla
- 📫 Contact: connectamanulla@gmail.com
- 🌐 Portfolio: amanulla.in
- 🔗 GitHub • LinkedIn • Twitter
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- Check the troubleshooting section above
- Review the console output for error messages
- Open an issue on the repository
Note: This script requires CUDA-compatible GPUs and the DeepSeek-OCR model. Make sure your system meets the hardware requirements before running.

