Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
6df8420
chore(UI): update html report
orenlab Feb 5, 2026
65c2128
chore(UI): update html report
orenlab Feb 5, 2026
d66a3b7
chore(UI): update html report
orenlab Feb 5, 2026
c5aea24
feat(core): improves clone-detection precision and explainability wit…
orenlab Feb 5, 2026
b868229
feat(core): Added --ci preset (--fail-on-new --no-color --quiet). Imp…
orenlab Feb 5, 2026
3f68840
fix(tests): test_cli_inprocess.py for the --ci case, I added the crea…
orenlab Feb 5, 2026
b1a0f6a
fix(core): Moved the default cache to the project cache
orenlab Feb 5, 2026
42d4ac5
fix(core): Add segment report noise suppression + CLI UX flags
orenlab Feb 5, 2026
21b5d87
fix(test): fix code clone!
orenlab Feb 5, 2026
be62027
chore(deps): added in --dev pre-commit and update .pre-commit-config.…
orenlab Feb 5, 2026
f4ea882
chore(deps): added in --dev pre-commit and update .pre-commit-config.…
orenlab Feb 5, 2026
facbc29
chore(docs): update docs
orenlab Feb 5, 2026
b2fd74f
chore(docs): update docs
orenlab Feb 6, 2026
c0187f2
fix(ui): prevent long clone paths from breaking report cards
orenlab Feb 6, 2026
e33a6c0
feat(report): add baseline provenance metadata to html/text/json and …
orenlab Feb 6, 2026
96e80a0
fix(ui): prevent long clone paths from breaking report cards
orenlab Feb 6, 2026
2d194a0
fix(ui): prevent long clone paths from breaking report cards
orenlab Feb 6, 2026
e99dc57
fix(core): fix detection integrity
orenlab Feb 6, 2026
f2cb95e
fix(core): fix detection integrity
orenlab Feb 6, 2026
e658f0b
refactor(cli): unify summary into one cache-aware table and centraliz…
orenlab Feb 6, 2026
eefa10d
feat(changelog): polish CHANGELOG.md
orenlab Feb 6, 2026
03fcb6f
fix(clone): remove internal test clone groups with shared helpers and…
orenlab Feb 8, 2026
985e5c2
feat(baseline): tamper-evident baseline contract
orenlab Feb 8, 2026
f7cd74d
fix(tests): stabilize CLI smoke subprocess runs under pytest-cov and …
orenlab Feb 8, 2026
0cab63b
fix(core): avoid lowering RLIMIT_CPU hard limit in parser guard to pr…
orenlab Feb 8, 2026
2df02fd
fix(cli): treat untrusted baseline as empty outside gating mode and f…
orenlab Feb 8, 2026
f91ab18
docs(release): align 1.3.0 docs with baseline gating contract, segmen…
orenlab Feb 8, 2026
b5401e3
chore(docs): normalize README headings/lists and clean markdown artif…
orenlab Feb 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ jobs:
run: uv sync --all-extras --dev

- name: Run tests
# Smoke CLI tests intentionally disable subprocess coverage collection
# to avoid runner-specific flakiness while keeping parent-process coverage strict.
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98

- name: Verify baseline exists
Expand All @@ -46,7 +48,7 @@ jobs:

- name: Check for new clones vs baseline
if: ${{ matrix.python-version == '3.13' }}
run: uv run codeclone . --fail-on-new --no-progress
run: uv run codeclone . --ci

lint:
runs-on: ubuntu-latest
Expand Down
32 changes: 27 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,31 @@
repos:
- repo: local
- repo: local
hooks:
- id: codeclone
- id: ruff-check
name: Ruff (lint)
entry: ruff check .
language: system
pass_filenames: false
types: [ python ]

- id: ruff-format
name: Ruff (format)
entry: ruff format .
language: system
pass_filenames: false
types: [ python ]

- id: mypy
name: Mypy
entry: mypy .
language: system
pass_filenames: false
types: [ python ]

- id: codeclone
name: CodeClone
entry: codeclone
language: python
args: [".", "--fail-on-new"]
types: [python]
language: system
pass_filenames: false
args: [ ".", "--ci" ]
types: [ python ]
74 changes: 74 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,79 @@
# Changelog

## [1.3.0] - 2026-02-08

### Overview

This release improves detection precision, determinism, and auditability, adds
segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache
contracts for CI usage.

**Breaking (CI):** baseline contract checks are stricter. Legacy or mismatched baselines
must be regenerated.

### Detection Engine

- Safe normalization upgrades: local logical equivalence, proven-domain commutative
canonicalization, and preserved symbolic call targets.
- Internal CFG metadata markers were moved to the `__CC_META__::...` namespace and emitted
as synthetic AST names to prevent collisions with user string literals.
- CFG precision upgrades: short-circuit micro-CFG, selective `try/except` raise-linking,
loop `break`/`continue` jump semantics, `for/while ... else`, and ordered `match`/`except`.
- Deterministic traversal and ordering improvements for stable clone grouping/report output.
- Segment-level internal detection added with strict candidate->hash confirmation; remains
report-only (not part of baseline/CI fail criteria).
- Segment report noise reduction: overlapping windows are merged and boilerplate-only groups
are suppressed using deterministic AST criteria.

### Baseline & CI

- Baseline format is versioned (`baseline_version`, `schema_version`) and legacy baselines
fail fast with regeneration guidance.
- Added tamper-evident baseline integrity for v1.3+ (`generator`, `payload_sha256`).
- Added configurable size guards: `--max-baseline-size-mb`, `--max-cache-size-mb`.
- Behavioral hardening: in normal mode, untrusted baseline states are ignored with warning
and compared as empty; in `--fail-on-new` / `--ci`, they fail fast with deterministic exit codes.

Update baseline after upgrade:

```bash
codeclone . --update-baseline
```

### CLI & Reports

- Added `--version`, `--cache-path` (legacy alias: `--cache-dir`), and `--ci` preset.
- Added strict output extension validation for `--html/.html`, `--json/.json`, `--text/.txt`.
- Summary output was redesigned for deterministic, cache-aware metrics across standard and CI modes.
- User-facing CLI messages were centralized in `codeclone/ui_messages.py`.
- HTML/TXT/JSON reports now include consistent provenance metadata (baseline/cache status fields).
- Clone group/report ordering is deterministic and aligned across HTML/TXT/JSON outputs.

### HTML UI

- Refreshed layout with improved navigation and dashboard widgets.
- Added command palette and keyboard shortcuts.
- Replaced emoji icons with inline SVG icons.
- Hardened escaping (text + attribute context) and snippet fallback behavior.

### Cache & Security

- Cache default moved to `<root>/.cache/codeclone/cache.json` with legacy path warning.
- Cache schema was extended to include segment data (`CACHE_VERSION=1.1`).
- Cache integrity uses constant-time signature checks and deep schema validation.
- Invalid/oversized cache is ignored deterministically and rebuilt from source.
- Added security regressions for traversal safety, report escaping, baseline/cache integrity,
and deterministic report ordering across formats.
- Fixed POSIX parser CPU guard to avoid lowering `RLIMIT_CPU` hard limit.

### Documentation & Packaging

- Updated README and docs (`architecture`, `cfg`, `SECURITY`, `CONTRIBUTING`) to reflect
current contracts and behaviors.
- Removed an invalid PyPI classifier from package metadata.

---

## [1.2.1] - 2026-02-02

### Overview
Expand Down
31 changes: 27 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ We especially welcome contributions in the following areas:

- Control Flow Graph (CFG) construction and semantics
- AST normalization improvements
- Segment-level clone detection and reporting
- False-positive reduction
- HTML report UX improvements
- Performance optimizations
Expand Down Expand Up @@ -83,6 +84,25 @@ Such changes often require design-level discussion and may be staged across vers

---

## Security & Safety Expectations

- Assume **untrusted input** (paths and source code).
- Add **negative tests** for any normalization or CFG change.
- Changes must preserve determinism and avoid new false positives.

---

## Baseline & CI

- Baselines are **versioned**. Regenerate with `codeclone . --update-baseline`
when detection logic or CodeClone version changes.
- Baselines in 1.3+ are tamper-evident (`generator`, `payload_sha256`).
- Baseline verification must use the same Python `major.minor` version.
- In `--fail-on-new` / `--ci`, untrusted baseline states fail fast. Outside gating
mode, baseline is ignored with warning and comparison proceeds against an empty baseline.

---

## Development Setup

```bash
Expand All @@ -96,15 +116,15 @@ pip install -e .[dev]
Run tests:

```bash
pytest
uv run pytest
```

Static checks:

```bash
mypy
ruff check .
ruff format .
uv run mypy .
uv run ruff check .
uv run ruff format .
```

---
Expand All @@ -128,6 +148,9 @@ CodeClone follows **semantic versioning**:
- **MINOR**: new detection capabilities (for example, CFG improvements)
- **PATCH**: bug fixes, performance improvements, and UI/UX polish

Baselines are versioned. Any change to detection behavior must include documentation
and tests, and may require baseline regeneration.

---

## License
Expand Down
Loading