Skip to content

Commit e85311f

Browse files
committed
chore(docs): update docs
1 parent 6c42cd1 commit e85311f

2 files changed

Lines changed: 60 additions & 35 deletions

File tree

CONTRIBUTING.md

Lines changed: 52 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@ CodeClone is an **AST + CFG-based code clone detector** focused on architectural
66
not textual similarity.
77

88
Contributions are welcome — especially those that improve **signal quality**, **CFG semantics**,
9-
and **real-world usability**.
9+
and **real-world CI usability**.
1010

1111
---
1212

1313
## Project Philosophy
1414

15-
Before contributing, please understand the core principles of the project:
15+
Core principles:
1616

1717
- **Low noise over high recall**
1818
- **Structural and control-flow similarity**, not semantic equivalence
1919
- **Deterministic and explainable behavior**
20-
- Optimized for **CI usage and architectural analysis**
20+
- Optimized for **CI usage** and architectural analysis
2121

22-
If a change increases false positives or reduces explainability,
22+
If a change increases false positives, reduces determinism, or weakens explainability,
2323
it is unlikely to be accepted.
2424

2525
---
@@ -42,14 +42,16 @@ We especially welcome contributions in the following areas:
4242

4343
Please use the appropriate **GitHub Issue Template**.
4444

45-
When reporting bugs related to clone detection, include:
45+
When reporting issues related to clone detection, include:
4646

47-
- minimal reproducible code snippets;
48-
- the Python version used;
47+
- minimal reproducible code snippets (preferred over screenshots);
48+
- the CodeClone version;
49+
- the Python version (`python_tag`, e.g. `cp313`);
4950
- whether the issue is primarily:
50-
- AST-related,
51-
- CFG-related,
52-
- reporting / UI-related.
51+
- AST-related,
52+
- CFG-related,
53+
- normalization-related,
54+
- reporting / UI-related.
5355

5456
Screenshots alone are usually insufficient for analysis.
5557

@@ -73,12 +75,13 @@ Well-argued false-positive reports are valuable and appreciated.
7375

7476
CFG behavior in CodeClone is intentionally conservative in the 1.x series.
7577

76-
If proposing changes to CFG semantics, please include:
78+
If proposing changes to CFG semantics, include:
7779

7880
- a description of the current behavior;
7981
- the proposed new behavior;
80-
- the expected impact on clone detection quality;
81-
- concrete code examples.
82+
- the expected impact on clone detection quality (noise/recall);
83+
- concrete code examples;
84+
- a note on determinism implications.
8285

8386
Such changes often require design-level discussion and may be staged across versions.
8487

@@ -87,21 +90,42 @@ Such changes often require design-level discussion and may be staged across vers
8790
## Security & Safety Expectations
8891

8992
- Assume **untrusted input** (paths and source code).
90-
- Add **negative tests** for any normalization or CFG change.
91-
- Changes must preserve determinism and avoid new false positives.
93+
- Prefer **fail-closed in gating modes** and **fail-open in normal modes** only when explicitly intended.
94+
- Add **negative tests** for any normalization/CFG change.
95+
- Changes must preserve determinism and avoid introducing new false positives.
9296

9397
---
9498

9599
## Baseline & CI
96100

97-
- Baselines are **versioned**. Regenerate with `codeclone . --update-baseline`
98-
when `fingerprint_version` changes.
99-
- Baseline regeneration is not required for UI/report/CLI/cache/performance-only changes
101+
### Baseline contract (v1)
102+
103+
- The baseline schema is versioned (`meta.schema_version`).
104+
- Compatibility/trust gates include `schema_version`, `fingerprint_version`, `python_tag`,
105+
and `meta.generator.name`.
106+
- Integrity is tamper-evident via `meta.payload_sha256` over canonical payload:
107+
`clones.functions`, `clones.blocks`, `meta.fingerprint_version`, `meta.python_tag`.
108+
(`created_at` and `meta.generator.version` are informational only.)
109+
110+
### When baseline regeneration is required
111+
112+
- Regenerate baseline with `codeclone . --update-baseline` **only when `fingerprint_version` changes**.
113+
- Regeneration is **not** required for UI/report/CLI/cache/performance-only changes
100114
if `fingerprint_version` is unchanged.
101-
- Baseline v1 is tamper-evident (`meta.generator`, `meta.payload_sha256`).
102-
- Baseline verification is pinned to `python_tag` (for example `cp313`).
103-
- In `--ci` (or explicit `--fail-on-new`), untrusted baseline states fail fast. Outside gating
104-
mode, baseline is ignored with warning and comparison proceeds against an empty baseline.
115+
116+
### Gating behavior
117+
118+
- In `--ci` (or explicit gating flags), **untrusted baseline states fail fast** as a contract error (exit 2).
119+
- Outside gating mode, an untrusted/missing baseline is ignored with a warning and comparison proceeds
120+
against an empty baseline.
121+
122+
### Exit codes contract
123+
124+
- **0** — success
125+
- **2** — contract error (e.g., missing/untrusted baseline in gating, invalid output path/extension, incompatible
126+
versions)
127+
- **3** — gating failure (new clones detected, `--fail-threshold` exceeded)
128+
- **5** — internal error (unexpected exception; please report)
105129

106130
---
107131

@@ -110,9 +134,7 @@ Such changes often require design-level discussion and may be staged across vers
110134
```bash
111135
git clone https://github.com/orenlab/codeclone.git
112136
cd codeclone
113-
python -m venv .venv
114-
source .venv/bin/activate
115-
pip install -e .[dev]
137+
uv sync --all-extras --dev
116138
```
117139

118140
Run tests:
@@ -133,8 +155,9 @@ uv run ruff format .
133155

134156
## Code Style
135157

136-
- Python 3.10+
158+
- Python **3.10–3.14**
137159
- Type annotations are required
160+
- `Any` should be minimized; prefer precise types and small typed helpers
138161
- `mypy` must pass
139162
- `ruff check` must pass
140163
- Code must be formatted with `ruff format`
@@ -147,11 +170,11 @@ uv run ruff format .
147170
CodeClone follows **semantic versioning**:
148171

149172
- **MAJOR**: fundamental detection model changes
150-
- **MINOR**: new detection capabilities (for example, CFG improvements)
173+
- **MINOR**: new detection capabilities (e.g., new detectors or major CFG/normalization behavior shifts)
151174
- **PATCH**: bug fixes, performance improvements, and UI/UX polish
152175

153-
Baselines are versioned. Any change to detection behavior must include documentation
154-
and tests, and may require baseline regeneration.
176+
Any change that affects detection behavior must include documentation and tests,
177+
and may require a `fingerprint_version` bump (and thus baseline regeneration).
155178

156179
---
157180

SECURITY.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Supported Versions
44

55
CodeClone is a static analysis tool and does not execute analyzed code at runtime.
6-
Nevertheless, security and robustness are treated as firstclass concerns.
6+
Nevertheless, security and robustness are treated as first-class concerns.
77

88
The following versions currently receive security updates:
99

@@ -42,12 +42,14 @@ Additional safeguards:
4242
- Report explainability fields are generated in Python core; UI is rendering-only and does not infer semantics.
4343
- Scanner traversal is root-confined and prevents symlink-based path escape.
4444
- Baseline files are schema/type validated with size limits and tamper-evident integrity fields
45-
(`generator`, `payload_sha256` for v1 baseline contract).
45+
(`meta.generator`, `meta.payload_sha256` for v1 baseline contract).
4646
- Baseline integrity is tamper-evident (audit signal), not tamper-proof cryptographic signing.
4747
An actor who can rewrite baseline content and recompute `payload_sha256` can still alter it.
48-
- Baseline hash excludes non-semantic metadata (`created_at`, `generator.version`) and
49-
covers canonical payload (`functions`, `blocks`, `python_tag`,
50-
`fingerprint_version`, `schema_version`).
48+
- Baseline hash excludes non-semantic metadata (`created_at`, `meta.generator.version`) and
49+
covers canonical payload (`clones.functions`, `clones.blocks`, `meta.python_tag`,
50+
`meta.fingerprint_version`).
51+
- `meta.schema_version` and `meta.generator.name` are validated as compatibility/trust gates and are
52+
intentionally excluded from `payload_sha256`.
5153
- In `--ci` (or explicit `--fail-on-new`), untrusted baseline states fail fast; otherwise baseline is ignored
5254
with explicit warning and comparison proceeds against an empty baseline.
5355
- Cache files are HMAC-signed (constant-time comparison), size-limited, and ignored on mismatch.
@@ -97,4 +99,4 @@ requests.
9799

98100
---
99101

100-
Thank you for helping keep CodeClone secure, reliable, and trustworthy.
102+
Thank you for helping keep CodeClone secure, reliable, and trustworthy.

0 commit comments

Comments
 (0)