Sudoku Vision

Solve sudoku puzzles from photos using computer vision and deep learning. Takes a photo of a sudoku puzzle, detects the grid, recognizes digits with a CNN, validates constraints, and solves it with a C backtracking solver.

                        sample_1.jpg
    +-------+-------+-------+         +-------+-------+-------+
    | . . 3 | . 2 . | 6 . . |         | 4 8 3 | 9 2 1 | 6 5 7 |
    | 9 . . | 3 . 5 | . . 1 |  solve  | 9 6 7 | 3 4 5 | 8 2 1 |
    | . . 1 | 8 . 6 | 4 . . |  ---->  | 2 5 1 | 8 7 6 | 4 9 3 |
    +-------+-------+-------+         +-------+-------+-------+
    | . . 8 | 1 . 2 | 9 . . |         | 5 4 8 | 1 3 2 | 9 7 6 |
    | 7 . . | . . . | . . 8 |         | 7 2 9 | 5 6 4 | 1 3 8 |
    | . . 6 | 7 . 8 | 2 . . |         | 1 3 6 | 7 9 8 | 2 4 5 |
    +-------+-------+-------+         +-------+-------+-------+
    | . . 2 | 6 . 9 | 5 . . |         | 3 7 2 | 6 8 9 | 5 1 4 |
    | 8 . . | 2 . 3 | . . 9 |         | 8 1 4 | 2 5 3 | 7 6 9 |
    | . . 5 | . 1 . | 3 . . |         | 6 9 5 | 4 1 7 | 3 8 2 |
    +-------+-------+-------+         +-------+-------+-------+
         recognized grid                        solution

Architecture

  Photo
    |
    v
+-------------------+     +-------------------+     +-------------------+
|   Grid Detection   | --> | Digit Recognition | --> |     Validation     |
|                     |     |                     |     |                     |
| - Contour analysis  |     | - EmptyClassifier   |     | - Row/col/box       |
| - Hough lines       |     |   (binary, 20K)     |     |   constraint check  |
| - Harris + RANSAC   |     | - DigitCNNv3        |     | - Beam search       |
| - Rotation correct. |     |   (residual, 280K)  |     |   conflict resolver |
| - Quality scoring   |     | - Top-k alternatives|     | - Constraint prop.  |
+-------------------+     +-------------------+     +-------------------+
                                                              |
                                                              v
                                                    +-------------------+
                                                    |    C Solver        |
                                                    |                     |
                                                    | - Backtracking     |
                                                    | - ~20us/solve      |
                                                    | - WASM for web     |
                                                    +-------------------+

Quick Start

# Install
pip install -e .

# Solve a puzzle from a photo
sudoku-vision solve photo.jpg

# JSON output for scripting
sudoku-vision solve photo.jpg --output json

# Use as a library
python -c "from sudoku_vision import solve_from_image; print(solve_from_image('photo.jpg'))"

Model Performance

Benchmarked on models/digit_cnn_v3_final.pt with 162 real cell samples:

Metric	Result	Target
Overall digit accuracy	89.5%	>= 90%
Empty cell recall	93.0%	>= 95%
ECE (calibration)	0.065	< 0.10
Grid detection rate	100%	> 85%
E2E solve rate	80%	>= 80%
Pipeline latency	~140ms	< 5s

Per-digit accuracy varies with sample size. Digits with fewer than 10 real samples (2, 3, 5, 7, 8, 9) show higher variance. Collecting more labeled data would improve these numbers.

Project Structure

sudoku-vision/
  sudoku_vision/           # Python package
    __init__.py            #   Public API: solve_from_image()
    cli.py                 #   CLI: solve, detect, benchmark
    config.py              #   Centralized configuration
    cv/                    #   Computer vision
      preprocess_v2.py     #     Multi-strategy preprocessing (CLAHE, shadow removal)
      grid_v2.py           #     Multi-method grid detection
      grid_quality.py      #     Grid quality scoring
      extract.py           #     Cell extraction (81 cells from warped grid)
    ml/                    #   Machine learning
      model_v3.py          #     DigitCNNv3 (residual + SE blocks), EmptyClassifier
      classifier.py        #     Two-stage classifier (empty + digit)
      preprocessing.py     #     CellPreprocessor (consistent train/inference)
      datasets.py          #     SyntheticDataset, RealDataset
      train_v2.py          #     Training with augmentation, mixup, label smoothing
      train_progressive.py #     3-phase progressive training
      export.py            #     ONNX export
      convert_coreml.py    #     CoreML export (iOS)
    pipeline/              #   End-to-end orchestration
      run_v2.py            #     Main pipeline
      validator.py         #     Constraint validation
      conflict_resolver.py #     Beam search correction
      constraint_resolver.py # Sudoku constraint propagation
  solver/                  # C sudoku solver
    src/main.c             #   CLI + benchmark
    src/sudoku.c           #   Backtracking solver
    src/wasm_api.c         #   WebAssembly bindings
  tests/                   # Test suite
  tools/                   # Data labeling and extraction utilities
  models/                  # Trained models (gitignored)

Tech Stack

Python 3.10+ -- package, CLI, pipeline orchestration
PyTorch -- CNN training and inference (DigitCNNv3, EmptyClassifier)
OpenCV -- image preprocessing, grid detection, perspective transform
C99 -- backtracking solver (~20us/solve, 1400 solves/sec)
ONNX -- model export for web inference
CoreML -- model export for iOS inference
WebAssembly -- solver compiled to WASM for browser use

Development

# Setup
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
make solver

# Run tests (skips tests requiring model files)
make test

# Run all tests including model benchmarks
make test-all

# Lint and format
make lint
make format

# Export models
make export-onnx

# Run accuracy benchmarks
make benchmark

Training

# Progressive training: MNIST -> synthetic -> real fine-tuning
python -m sudoku_vision.ml.train_progressive --output-dir models/

# Train empty cell classifier
python -m sudoku_vision.ml.train_empty --data-dir data/real

# Calibrate model confidence (temperature scaling)
python -m sudoku_vision.ml.calibrate --model models/digit_cnn_v3_final.pt

Known Limitations

Per-digit accuracy on real data is limited by small labeled dataset (162 samples). Digits 2 and 5 are below 70% accuracy due to having only 5-6 real samples each.
Empty cell recall (93%) is below the 95% target. The EmptyClassifier would benefit from more diverse empty cell examples.
Heavily skewed or low-contrast photos may fail grid detection.
The solver assumes a valid sudoku with a unique solution.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
data		data
ios		ios
solver		solver
sudoku_vision		sudoku_vision
tests		tests
tools		tools
web		web
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sudoku Vision

Architecture

Quick Start

Model Performance

Project Structure

Tech Stack

Development

Training

Known Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sudoku Vision

Architecture

Quick Start

Model Performance

Project Structure

Tech Stack

Development

Training

Known Limitations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages