Solve sudoku puzzles from photos using computer vision and deep learning. Takes a photo of a sudoku puzzle, detects the grid, recognizes digits with a CNN, validates constraints, and solves it with a C backtracking solver.
sample_1.jpg
+-------+-------+-------+ +-------+-------+-------+
| . . 3 | . 2 . | 6 . . | | 4 8 3 | 9 2 1 | 6 5 7 |
| 9 . . | 3 . 5 | . . 1 | solve | 9 6 7 | 3 4 5 | 8 2 1 |
| . . 1 | 8 . 6 | 4 . . | ----> | 2 5 1 | 8 7 6 | 4 9 3 |
+-------+-------+-------+ +-------+-------+-------+
| . . 8 | 1 . 2 | 9 . . | | 5 4 8 | 1 3 2 | 9 7 6 |
| 7 . . | . . . | . . 8 | | 7 2 9 | 5 6 4 | 1 3 8 |
| . . 6 | 7 . 8 | 2 . . | | 1 3 6 | 7 9 8 | 2 4 5 |
+-------+-------+-------+ +-------+-------+-------+
| . . 2 | 6 . 9 | 5 . . | | 3 7 2 | 6 8 9 | 5 1 4 |
| 8 . . | 2 . 3 | . . 9 | | 8 1 4 | 2 5 3 | 7 6 9 |
| . . 5 | . 1 . | 3 . . | | 6 9 5 | 4 1 7 | 3 8 2 |
+-------+-------+-------+ +-------+-------+-------+
recognized grid solution
Photo
|
v
+-------------------+ +-------------------+ +-------------------+
| Grid Detection | --> | Digit Recognition | --> | Validation |
| | | | | |
| - Contour analysis | | - EmptyClassifier | | - Row/col/box |
| - Hough lines | | (binary, 20K) | | constraint check |
| - Harris + RANSAC | | - DigitCNNv3 | | - Beam search |
| - Rotation correct. | | (residual, 280K) | | conflict resolver |
| - Quality scoring | | - Top-k alternatives| | - Constraint prop. |
+-------------------+ +-------------------+ +-------------------+
|
v
+-------------------+
| C Solver |
| |
| - Backtracking |
| - ~20us/solve |
| - WASM for web |
+-------------------+
# Install
pip install -e .
# Solve a puzzle from a photo
sudoku-vision solve photo.jpg
# JSON output for scripting
sudoku-vision solve photo.jpg --output json
# Use as a library
python -c "from sudoku_vision import solve_from_image; print(solve_from_image('photo.jpg'))"Benchmarked on models/digit_cnn_v3_final.pt with 162 real cell samples:
| Metric | Result | Target |
|---|---|---|
| Overall digit accuracy | 89.5% | >= 90% |
| Empty cell recall | 93.0% | >= 95% |
| ECE (calibration) | 0.065 | < 0.10 |
| Grid detection rate | 100% | > 85% |
| E2E solve rate | 80% | >= 80% |
| Pipeline latency | ~140ms | < 5s |
Per-digit accuracy varies with sample size. Digits with fewer than 10 real samples (2, 3, 5, 7, 8, 9) show higher variance. Collecting more labeled data would improve these numbers.
sudoku-vision/
sudoku_vision/ # Python package
__init__.py # Public API: solve_from_image()
cli.py # CLI: solve, detect, benchmark
config.py # Centralized configuration
cv/ # Computer vision
preprocess_v2.py # Multi-strategy preprocessing (CLAHE, shadow removal)
grid_v2.py # Multi-method grid detection
grid_quality.py # Grid quality scoring
extract.py # Cell extraction (81 cells from warped grid)
ml/ # Machine learning
model_v3.py # DigitCNNv3 (residual + SE blocks), EmptyClassifier
classifier.py # Two-stage classifier (empty + digit)
preprocessing.py # CellPreprocessor (consistent train/inference)
datasets.py # SyntheticDataset, RealDataset
train_v2.py # Training with augmentation, mixup, label smoothing
train_progressive.py # 3-phase progressive training
export.py # ONNX export
convert_coreml.py # CoreML export (iOS)
pipeline/ # End-to-end orchestration
run_v2.py # Main pipeline
validator.py # Constraint validation
conflict_resolver.py # Beam search correction
constraint_resolver.py # Sudoku constraint propagation
solver/ # C sudoku solver
src/main.c # CLI + benchmark
src/sudoku.c # Backtracking solver
src/wasm_api.c # WebAssembly bindings
tests/ # Test suite
tools/ # Data labeling and extraction utilities
models/ # Trained models (gitignored)
- Python 3.10+ -- package, CLI, pipeline orchestration
- PyTorch -- CNN training and inference (DigitCNNv3, EmptyClassifier)
- OpenCV -- image preprocessing, grid detection, perspective transform
- C99 -- backtracking solver (~20us/solve, 1400 solves/sec)
- ONNX -- model export for web inference
- CoreML -- model export for iOS inference
- WebAssembly -- solver compiled to WASM for browser use
# Setup
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
make solver
# Run tests (skips tests requiring model files)
make test
# Run all tests including model benchmarks
make test-all
# Lint and format
make lint
make format
# Export models
make export-onnx
# Run accuracy benchmarks
make benchmark# Progressive training: MNIST -> synthetic -> real fine-tuning
python -m sudoku_vision.ml.train_progressive --output-dir models/
# Train empty cell classifier
python -m sudoku_vision.ml.train_empty --data-dir data/real
# Calibrate model confidence (temperature scaling)
python -m sudoku_vision.ml.calibrate --model models/digit_cnn_v3_final.pt- Per-digit accuracy on real data is limited by small labeled dataset (162 samples). Digits 2 and 5 are below 70% accuracy due to having only 5-6 real samples each.
- Empty cell recall (93%) is below the 95% target. The EmptyClassifier would benefit from more diverse empty cell examples.
- Heavily skewed or low-contrast photos may fail grid detection.
- The solver assumes a valid sudoku with a unique solution.
MIT