@@ -6,20 +6,20 @@ CodeClone is an **AST + CFG-based code clone detector** focused on architectural
66not textual similarity.
77
88Contributions are welcome — especially those that improve ** signal quality** , ** CFG semantics** ,
9- and ** real-world usability** .
9+ and ** real-world CI usability** .
1010
1111---
1212
1313## Project Philosophy
1414
15- Before contributing, please understand the core principles of the project :
15+ Core principles:
1616
1717- ** Low noise over high recall**
1818- ** Structural and control-flow similarity** , not semantic equivalence
1919- ** Deterministic and explainable behavior**
20- - Optimized for ** CI usage and architectural analysis**
20+ - Optimized for ** CI usage** and architectural analysis
2121
22- If a change increases false positives or reduces explainability,
22+ If a change increases false positives, reduces determinism, or weakens explainability,
2323it is unlikely to be accepted.
2424
2525---
@@ -42,14 +42,16 @@ We especially welcome contributions in the following areas:
4242
4343Please use the appropriate ** GitHub Issue Template** .
4444
45- When reporting bugs related to clone detection, include:
45+ When reporting issues related to clone detection, include:
4646
47- - minimal reproducible code snippets;
48- - the Python version used;
47+ - minimal reproducible code snippets (preferred over screenshots);
48+ - the CodeClone version;
49+ - the Python version (` python_tag ` , e.g. ` cp313 ` );
4950- whether the issue is primarily:
50- - AST-related,
51- - CFG-related,
52- - reporting / UI-related.
51+ - AST-related,
52+ - CFG-related,
53+ - normalization-related,
54+ - reporting / UI-related.
5355
5456Screenshots alone are usually insufficient for analysis.
5557
@@ -73,12 +75,13 @@ Well-argued false-positive reports are valuable and appreciated.
7375
7476CFG behavior in CodeClone is intentionally conservative in the 1.x series.
7577
76- If proposing changes to CFG semantics, please include:
78+ If proposing changes to CFG semantics, include:
7779
7880- a description of the current behavior;
7981- the proposed new behavior;
80- - the expected impact on clone detection quality;
81- - concrete code examples.
82+ - the expected impact on clone detection quality (noise/recall);
83+ - concrete code examples;
84+ - a note on determinism implications.
8285
8386Such changes often require design-level discussion and may be staged across versions.
8487
@@ -87,21 +90,42 @@ Such changes often require design-level discussion and may be staged across vers
8790## Security & Safety Expectations
8891
8992- Assume ** untrusted input** (paths and source code).
90- - Add ** negative tests** for any normalization or CFG change.
91- - Changes must preserve determinism and avoid new false positives.
93+ - Prefer ** fail-closed in gating modes** and ** fail-open in normal modes** only when explicitly intended.
94+ - Add ** negative tests** for any normalization/CFG change.
95+ - Changes must preserve determinism and avoid introducing new false positives.
9296
9397---
9498
9599## Baseline & CI
96100
97- - Baselines are ** versioned** . Regenerate with ` codeclone . --update-baseline `
98- when ` fingerprint_version ` changes.
99- - Baseline regeneration is not required for UI/report/CLI/cache/performance-only changes
101+ ### Baseline contract (v1)
102+
103+ - The baseline schema is versioned (` meta.schema_version ` ).
104+ - Compatibility/trust gates include ` schema_version ` , ` fingerprint_version ` , ` python_tag ` ,
105+ and ` meta.generator.name ` .
106+ - Integrity is tamper-evident via ` meta.payload_sha256 ` over canonical payload:
107+ ` clones.functions ` , ` clones.blocks ` , ` meta.fingerprint_version ` , ` meta.python_tag ` .
108+ (` created_at ` and ` meta.generator.version ` are informational only.)
109+
110+ ### When baseline regeneration is required
111+
112+ - Regenerate baseline with ` codeclone . --update-baseline ` ** only when ` fingerprint_version ` changes** .
113+ - Regeneration is ** not** required for UI/report/CLI/cache/performance-only changes
100114 if ` fingerprint_version ` is unchanged.
101- - Baseline v1 is tamper-evident (` meta.generator ` , ` meta.payload_sha256 ` ).
102- - Baseline verification is pinned to ` python_tag ` (for example ` cp313 ` ).
103- - In ` --ci ` (or explicit ` --fail-on-new ` ), untrusted baseline states fail fast. Outside gating
104- mode, baseline is ignored with warning and comparison proceeds against an empty baseline.
115+
116+ ### Gating behavior
117+
118+ - In ` --ci ` (or explicit gating flags), ** untrusted baseline states fail fast** as a contract error (exit 2).
119+ - Outside gating mode, an untrusted/missing baseline is ignored with a warning and comparison proceeds
120+ against an empty baseline.
121+
122+ ### Exit codes contract
123+
124+ - ** 0** — success
125+ - ** 2** — contract error (e.g., missing/untrusted baseline in gating, invalid output path/extension, incompatible
126+ versions)
127+ - ** 3** — gating failure (new clones detected, ` --fail-threshold ` exceeded)
128+ - ** 5** — internal error (unexpected exception; please report)
105129
106130---
107131
@@ -110,9 +134,7 @@ Such changes often require design-level discussion and may be staged across vers
110134``` bash
111135git clone https://github.com/orenlab/codeclone.git
112136cd codeclone
113- python -m venv .venv
114- source .venv/bin/activate
115- pip install -e .[dev]
137+ uv sync --all-extras --dev
116138```
117139
118140Run tests:
@@ -133,8 +155,9 @@ uv run ruff format .
133155
134156## Code Style
135157
136- - Python 3.10+
158+ - Python ** 3.10–3.14 **
137159- Type annotations are required
160+ - ` Any ` should be minimized; prefer precise types and small typed helpers
138161- ` mypy ` must pass
139162- ` ruff check ` must pass
140163- Code must be formatted with ` ruff format `
@@ -147,11 +170,11 @@ uv run ruff format .
147170CodeClone follows ** semantic versioning** :
148171
149172- ** MAJOR** : fundamental detection model changes
150- - ** MINOR** : new detection capabilities (for example, CFG improvements )
173+ - ** MINOR** : new detection capabilities (e.g., new detectors or major CFG/normalization behavior shifts )
151174- ** PATCH** : bug fixes, performance improvements, and UI/UX polish
152175
153- Baselines are versioned. Any change to detection behavior must include documentation
154- and tests, and may require baseline regeneration.
176+ Any change that affects detection behavior must include documentation and tests,
177+ and may require a ` fingerprint_version ` bump (and thus baseline regeneration) .
155178
156179---
157180
0 commit comments