codedupes

codedupes detects duplicate and potentially unused Python code with:

Traditional AST/token matching (exact + Jaccard near-duplicate)
Semantic matching with model-profile embeddings (default gte-modernbert-base)
Heuristic unused-code detection

Install

pip install "codedupes @ git+https://github.com/pszemraj/codedupes.git"

Optional GPU extras:

pip install "codedupes[gpu] @ git+https://github.com/pszemraj/codedupes.git"

Requires Python 3.11+. Details are in docs/install.md

Quick Start

codedupes check ./src
codedupes search ./src "normalize request payload"
codedupes info

codedupes check defaults to a hybrid-first report:

one combined duplicate list (Hybrid Duplicates)
likely dead code (potentially_unused)

Use --show-all to include raw traditional + raw semantic duplicate lists.

Documentation

Primary docs live under docs/:

docs/index.md: documentation map and ownership
docs/cli.md: commands, flags, and defaults
docs/model-profiles.md: semantic model aliases, profile defaults, and task behavior
docs/analysis-defaults.md: analysis-behavior defaults and heuristics
docs/output.md: JSON schemas and exit codes
docs/usage.md: practical workflows and tuning examples
docs/python-api.md: programmatic API usage
docs/hybrid-tuning.md: hybrid gate tuning workflow

Notes and limits

Call graph and unused detection are heuristic and conservative by default.
Semantic model-profile defaults and task behavior are defined in docs/model-profiles.md.
Analysis defaults (semantic candidate scope, tiny-traditional filtering, hybrid gates) are defined in docs/analysis-defaults.md.
Semantic analysis may download model weights on first use.
Extraction skips common artifact/cache directories by default (__pycache__, .venv, etc).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
scripts		scripts
src/codedupes		src/codedupes
test_fixtures		test_fixtures
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codedupes

Install

Quick Start

Documentation

Notes and limits

About

Uh oh!

Releases 2

Packages

Languages

License

pszemraj/codedupes

Folders and files

Latest commit

History

Repository files navigation

codedupes

Install

Quick Start

Documentation

Notes and limits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages