Skip to content

Commit 8124008

Browse files
authored
Merge pull request #3 from orenlab/feat/1.3.0
Feat/1.3.0
2 parents 91de5ec + b5401e3 commit 8124008

49 files changed

Lines changed: 9786 additions & 1366 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/tests.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ jobs:
3838
run: uv sync --all-extras --dev
3939

4040
- name: Run tests
41+
# Smoke CLI tests intentionally disable subprocess coverage collection
42+
# to avoid runner-specific flakiness while keeping parent-process coverage strict.
4143
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98
4244

4345
- name: Verify baseline exists
@@ -46,7 +48,7 @@ jobs:
4648

4749
- name: Check for new clones vs baseline
4850
if: ${{ matrix.python-version == '3.13' }}
49-
run: uv run codeclone . --fail-on-new --no-progress
51+
run: uv run codeclone . --ci
5052

5153
lint:
5254
runs-on: ubuntu-latest

.pre-commit-config.yaml

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,31 @@
11
repos:
2-
- repo: local
2+
- repo: local
33
hooks:
4-
- id: codeclone
4+
- id: ruff-check
5+
name: Ruff (lint)
6+
entry: ruff check .
7+
language: system
8+
pass_filenames: false
9+
types: [ python ]
10+
11+
- id: ruff-format
12+
name: Ruff (format)
13+
entry: ruff format .
14+
language: system
15+
pass_filenames: false
16+
types: [ python ]
17+
18+
- id: mypy
19+
name: Mypy
20+
entry: mypy .
21+
language: system
22+
pass_filenames: false
23+
types: [ python ]
24+
25+
- id: codeclone
526
name: CodeClone
627
entry: codeclone
7-
language: python
8-
args: [".", "--fail-on-new"]
9-
types: [python]
28+
language: system
29+
pass_filenames: false
30+
args: [ ".", "--ci" ]
31+
types: [ python ]

CHANGELOG.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,79 @@
11
# Changelog
22

3+
## [1.3.0] - 2026-02-08
4+
5+
### Overview
6+
7+
This release improves detection precision, determinism, and auditability, adds
8+
segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache
9+
contracts for CI usage.
10+
11+
**Breaking (CI):** baseline contract checks are stricter. Legacy or mismatched baselines
12+
must be regenerated.
13+
14+
### Detection Engine
15+
16+
- Safe normalization upgrades: local logical equivalence, proven-domain commutative
17+
canonicalization, and preserved symbolic call targets.
18+
- Internal CFG metadata markers were moved to the `__CC_META__::...` namespace and emitted
19+
as synthetic AST names to prevent collisions with user string literals.
20+
- CFG precision upgrades: short-circuit micro-CFG, selective `try/except` raise-linking,
21+
loop `break`/`continue` jump semantics, `for/while ... else`, and ordered `match`/`except`.
22+
- Deterministic traversal and ordering improvements for stable clone grouping/report output.
23+
- Segment-level internal detection added with strict candidate->hash confirmation; remains
24+
report-only (not part of baseline/CI fail criteria).
25+
- Segment report noise reduction: overlapping windows are merged and boilerplate-only groups
26+
are suppressed using deterministic AST criteria.
27+
28+
### Baseline & CI
29+
30+
- Baseline format is versioned (`baseline_version`, `schema_version`) and legacy baselines
31+
fail fast with regeneration guidance.
32+
- Added tamper-evident baseline integrity for v1.3+ (`generator`, `payload_sha256`).
33+
- Added configurable size guards: `--max-baseline-size-mb`, `--max-cache-size-mb`.
34+
- Behavioral hardening: in normal mode, untrusted baseline states are ignored with warning
35+
and compared as empty; in `--fail-on-new` / `--ci`, they fail fast with deterministic exit codes.
36+
37+
Update baseline after upgrade:
38+
39+
```bash
40+
codeclone . --update-baseline
41+
```
42+
43+
### CLI & Reports
44+
45+
- Added `--version`, `--cache-path` (legacy alias: `--cache-dir`), and `--ci` preset.
46+
- Added strict output extension validation for `--html/.html`, `--json/.json`, `--text/.txt`.
47+
- Summary output was redesigned for deterministic, cache-aware metrics across standard and CI modes.
48+
- User-facing CLI messages were centralized in `codeclone/ui_messages.py`.
49+
- HTML/TXT/JSON reports now include consistent provenance metadata (baseline/cache status fields).
50+
- Clone group/report ordering is deterministic and aligned across HTML/TXT/JSON outputs.
51+
52+
### HTML UI
53+
54+
- Refreshed layout with improved navigation and dashboard widgets.
55+
- Added command palette and keyboard shortcuts.
56+
- Replaced emoji icons with inline SVG icons.
57+
- Hardened escaping (text + attribute context) and snippet fallback behavior.
58+
59+
### Cache & Security
60+
61+
- Cache default moved to `<root>/.cache/codeclone/cache.json` with legacy path warning.
62+
- Cache schema was extended to include segment data (`CACHE_VERSION=1.1`).
63+
- Cache integrity uses constant-time signature checks and deep schema validation.
64+
- Invalid/oversized cache is ignored deterministically and rebuilt from source.
65+
- Added security regressions for traversal safety, report escaping, baseline/cache integrity,
66+
and deterministic report ordering across formats.
67+
- Fixed POSIX parser CPU guard to avoid lowering `RLIMIT_CPU` hard limit.
68+
69+
### Documentation & Packaging
70+
71+
- Updated README and docs (`architecture`, `cfg`, `SECURITY`, `CONTRIBUTING`) to reflect
72+
current contracts and behaviors.
73+
- Removed an invalid PyPI classifier from package metadata.
74+
75+
---
76+
377
## [1.2.1] - 2026-02-02
478

579
### Overview

CONTRIBUTING.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ We especially welcome contributions in the following areas:
3030

3131
- Control Flow Graph (CFG) construction and semantics
3232
- AST normalization improvements
33+
- Segment-level clone detection and reporting
3334
- False-positive reduction
3435
- HTML report UX improvements
3536
- Performance optimizations
@@ -83,6 +84,25 @@ Such changes often require design-level discussion and may be staged across vers
8384

8485
---
8586

87+
## Security & Safety Expectations
88+
89+
- Assume **untrusted input** (paths and source code).
90+
- Add **negative tests** for any normalization or CFG change.
91+
- Changes must preserve determinism and avoid new false positives.
92+
93+
---
94+
95+
## Baseline & CI
96+
97+
- Baselines are **versioned**. Regenerate with `codeclone . --update-baseline`
98+
when detection logic or CodeClone version changes.
99+
- Baselines in 1.3+ are tamper-evident (`generator`, `payload_sha256`).
100+
- Baseline verification must use the same Python `major.minor` version.
101+
- In `--fail-on-new` / `--ci`, untrusted baseline states fail fast. Outside gating
102+
mode, baseline is ignored with warning and comparison proceeds against an empty baseline.
103+
104+
---
105+
86106
## Development Setup
87107

88108
```bash
@@ -96,15 +116,15 @@ pip install -e .[dev]
96116
Run tests:
97117

98118
```bash
99-
pytest
119+
uv run pytest
100120
```
101121

102122
Static checks:
103123

104124
```bash
105-
mypy
106-
ruff check .
107-
ruff format .
125+
uv run mypy .
126+
uv run ruff check .
127+
uv run ruff format .
108128
```
109129

110130
---
@@ -128,6 +148,9 @@ CodeClone follows **semantic versioning**:
128148
- **MINOR**: new detection capabilities (for example, CFG improvements)
129149
- **PATCH**: bug fixes, performance improvements, and UI/UX polish
130150

151+
Baselines are versioned. Any change to detection behavior must include documentation
152+
and tests, and may require baseline regeneration.
153+
131154
---
132155

133156
## License

0 commit comments

Comments
 (0)