|
1 | 1 | # Changelog |
2 | 2 |
|
| 3 | +## [1.2.1] - 2026-02-02 |
| 4 | + |
| 5 | +### Overview |
| 6 | + |
| 7 | +This release focuses on security hardening, robustness, and long-term maintainability. |
| 8 | +No breaking API changes were introduced. |
| 9 | + |
| 10 | +The goal of this release is to provide users with a safe, deterministic, and CI-friendly |
| 11 | +tool suitable for security-sensitive and large-scale environments. |
| 12 | + |
| 13 | +### Security & Robustness |
| 14 | + |
| 15 | +- **Path Traversal Protection** |
| 16 | + Implemented strict path validation to prevent scanning outside the project root or |
| 17 | + accessing sensitive system directories, including macOS `/private` paths. |
| 18 | + |
| 19 | +- **Cache Integrity Protection** |
| 20 | + Added HMAC-SHA256 signing for cache files to prevent cache poisoning and detect tampering. |
| 21 | + |
| 22 | +- **Parser Safety Limits** |
| 23 | + Introduced AST parsing time limits to mitigate risks from pathological or adversarial inputs. |
| 24 | + |
| 25 | +- **Resource Exhaustion Protection** |
| 26 | + Enforced a maximum file size limit (10MB) and a maximum file count per scan to prevent |
| 27 | + excessive memory or CPU usage. |
| 28 | + |
| 29 | +- **Structured Error Handling** |
| 30 | + Introduced a dedicated exception hierarchy (`ParseError`, `CacheError`, etc.) and replaced |
| 31 | + broad exception handling with graceful, user-friendly failure reporting. |
| 32 | + |
| 33 | +### Performance Improvements |
| 34 | + |
| 35 | +- **Optimized AST Normalization** |
| 36 | + Replaced expensive `deepcopy` operations with in-place AST normalization, significantly |
| 37 | + reducing CPU and memory overhead. |
| 38 | + |
| 39 | +- **Improved Memory Efficiency** |
| 40 | + Added an LRU cache for file reading and optimized string concatenation during fingerprint |
| 41 | + generation. |
| 42 | + |
| 43 | +- **HTML Report Memory Bounds** |
| 44 | + HTML reports now read only the required line ranges instead of entire files, reducing peak |
| 45 | + memory usage on large codebases. |
| 46 | + |
| 47 | +### Architecture & Maintainability |
| 48 | + |
| 49 | +- **Strict Type Safety** |
| 50 | + Migrated all optional typing to Python 3.10+ `| None` syntax and achieved 100% `mypy` strict |
| 51 | + compliance. |
| 52 | + |
| 53 | +- **Modular CFG Design** |
| 54 | + Split CFG data structures and builder logic into separate modules (`cfg_model.py` and |
| 55 | + `cfg.py`) for improved clarity and extensibility. |
| 56 | + |
| 57 | +- **Template Extraction** |
| 58 | + Extracted HTML templates into a dedicated `templates.py` module. |
| 59 | + |
| 60 | +- Added a `py.typed` marker for downstream type checkers. |
| 61 | +- Added `__slots__` to performance-critical classes to reduce per-object memory overhead. |
| 62 | + |
| 63 | +### CLI & User Experience |
| 64 | + |
| 65 | +- Added a sequential execution fallback when process pools are unavailable (for example, in |
| 66 | + restricted or sandboxed environments). |
| 67 | +- Emit clear, user-visible warnings when cache validation fails instead of silently ignoring |
| 68 | + corrupted state. |
| 69 | +- Hardened HTML report template to safely embed JavaScript template literals and aligned it |
| 70 | + with linting requirements. |
| 71 | + |
| 72 | +### Testing & Quality |
| 73 | + |
| 74 | +- Expanded unit and integration test coverage across the CLI, CFG construction, cache |
| 75 | + handling, scanner, and HTML reporting paths. |
| 76 | +- Added security regression tests for dot-dot traversal and symlinked sensitive directories. |
| 77 | +- Tightened cache mismatch assertions to verify full state reset. |
| 78 | +- Achieved and enforced 98%+ line coverage, with coverage configuration added to |
| 79 | + `pyproject.toml`. |
| 80 | +- Added GitHub Actions workflow with Python 3.10–3.14 test matrix, including `ruff` and |
| 81 | + `mypy` checks. |
| 82 | +- CI baseline enforcement now runs on a single pinned Python version to avoid AST dump |
| 83 | + differences across interpreter versions. |
| 84 | + |
| 85 | +### Python Version Consistency for Baseline Checks |
| 86 | + |
| 87 | +Due to inherent differences in Python’s AST between interpreter versions, baseline |
| 88 | +generation and verification must be performed using the same Python version. |
| 89 | + |
| 90 | +The baseline file now stores the Python version (`major.minor`) used during generation. |
| 91 | +When running with `--fail-on-new`, codeclone verifies that the current interpreter version |
| 92 | +matches the baseline and exits with code 2 if they differ. |
| 93 | + |
| 94 | +This design ensures deterministic and reproducible clone detection results while preserving |
| 95 | +support for Python 3.10–3.14 across the test matrix. |
| 96 | + |
| 97 | +### Fixed |
| 98 | + |
| 99 | +- **CFG Exception Handling** |
| 100 | + Fixed incorrect control-flow linking for `try`/`except` blocks. |
| 101 | + |
| 102 | +- **Pattern Matching Support** |
| 103 | + Added missing structural handling for `match`/`case` statements in the CFG. |
| 104 | + |
| 105 | +- **Block Detection Scaling** |
| 106 | + Made `MIN_LINE_DISTANCE` dynamic based on block size to improve clone detection accuracy |
| 107 | + across differently sized functions. |
| 108 | + |
| 109 | +--- |
| 110 | + |
3 | 111 | ## [1.2.0] - 2026-02-02 |
4 | 112 |
|
5 | 113 | ### BREAKING CHANGES |
6 | 114 |
|
7 | | -- **CLI Arguments**: Renamed output flags for brevity and consistency: |
| 115 | +- **CLI Arguments** |
| 116 | + Renamed output flags for brevity and consistency: |
8 | 117 | - `--json-out` → `--json` |
9 | 118 | - `--text-out` → `--text` |
10 | 119 | - `--html-out` → `--html` |
11 | 120 | - `--cache` → `--cache-dir` |
12 | | -- **Baseline Behavior**: |
13 | | - - The default baseline file location has changed from `~/.config/codeclone/baseline.json` to |
14 | | - `./codeclone.baseline.json`. This encourages committing the baseline file to the repository, simplifying CI/CD |
15 | | - integration. |
16 | | - - The CLI now warns if a baseline file is expected but missing (unless `--update-baseline` is used). |
| 121 | + |
| 122 | +- **Baseline Behavior** |
| 123 | + - The default baseline file location changed from |
| 124 | + `~/.config/codeclone/baseline.json` to `./codeclone.baseline.json`. |
| 125 | + - The CLI now warns if a baseline file is expected but missing (unless |
| 126 | + `--update-baseline` is used). |
17 | 127 |
|
18 | 128 | ### Added |
19 | 129 |
|
20 | | -- **Detection Engine**: |
21 | | - - **Deep CFG Analysis**: Added support for constructing control flow graphs for `try`/`except`/`finally`, `with`/ |
22 | | - `async with`, and `match`/`case` (Python 3.10+) statements. The tool now analyzes the internal structure of these |
23 | | - blocks instead of treating them as opaque statements. |
24 | | - - **Normalization**: Implemented normalization for Augmented Assignments. Code using `x += 1` is now detected as a |
25 | | - clone of `x = x + 1`. |
26 | | -- **Rich Output**: Integrated `rich` library for professional CLI output, including: |
27 | | - - Color-coded status messages (Success/Warning/Error). |
28 | | - - Progress bars and spinners for long-running tasks. |
| 130 | +- **Detection Engine** |
| 131 | + - Deep CFG analysis for `try`/`except`/`finally`, `with`/`async with`, and |
| 132 | + `match`/`case` (Python 3.10+) statements. |
| 133 | + - Normalization for augmented assignments (`x += 1` vs `x = x + 1`). |
| 134 | + |
| 135 | +- **Rich Output** |
| 136 | + - Color-coded status messages. |
| 137 | + - Progress indicators for long-running tasks. |
29 | 138 | - Formatted summary tables. |
30 | | -- **CI/CD Improvements**: Clearer separation of arguments in `--help` output (Target, Tuning, Baseline, Reporting). |
| 139 | + |
| 140 | +- **CI/CD Improvements** |
| 141 | + - Clearer argument grouping in `--help` output. |
31 | 142 |
|
32 | 143 | ### Improved |
33 | 144 |
|
34 | | -- **Baseline**: Enhanced `Baseline` class with safer JSON loading (error handling for corrupted files), better typing ( |
35 | | - using `set` instead of `Set`), and cleaner API for creating instances (`from_groups` accepts path). |
36 | | -- **Cache**: Refactored `Cache` to handle corrupted cache files gracefully by starting fresh instead of crashing. |
37 | | - Updated typing to modern standards. |
38 | | -- **Normalization**: Added `copy.deepcopy` to AST normalization to prevent side effects on the original AST nodes during |
39 | | - fingerprinting. This ensures the AST remains intact for any subsequent operations. |
40 | | -- **Typing**: General typing improvements across `report.py` and other modules to align with Python 3.10+ practices. |
| 145 | +- **Baseline** |
| 146 | + - Safer JSON loading. |
| 147 | + - Improved typing and cleaner construction API. |
| 148 | + |
| 149 | +- **Cache** |
| 150 | + - Graceful recovery from corrupted cache files. |
| 151 | + - Updated typing to modern Python standards. |
| 152 | + |
| 153 | +- **Typing** |
| 154 | + - General typing improvements across reporting and normalization modules. |
| 155 | + |
| 156 | +--- |
41 | 157 |
|
42 | | -## [1.1.0] — 2026-01-19 |
| 158 | +## [1.1.0] - 2026-01-19 |
43 | 159 |
|
44 | 160 | ### Added |
45 | 161 |
|
46 | | -- Control Flow Graph (CFG v1) for structural clone detection |
47 | | -- Deterministic CFG-based function fingerprints |
48 | | -- Interactive HTML report with syntax highlighting |
49 | | -- Dark/light theme toggle in HTML report |
50 | | -- Block-level clone visualization |
| 162 | +- Control Flow Graph (CFG v1) for structural clone detection. |
| 163 | +- Deterministic CFG-based function fingerprints. |
| 164 | +- Interactive HTML report with syntax highlighting. |
| 165 | +- Block-level clone visualization. |
51 | 166 |
|
52 | 167 | ### Changed |
53 | 168 |
|
54 | | -- Function clone detection now based on CFG instead of pure AST |
55 | | -- Improved robustness against refactoring and control-flow changes |
| 169 | +- Function clone detection now based on CFG instead of pure AST. |
| 170 | +- Improved robustness against refactoring and control-flow changes. |
56 | 171 |
|
57 | 172 | ### Documentation |
58 | 173 |
|
59 | | -- Added `docs/cfg.md` with CFG semantics and limitations |
60 | | -- Added `docs/architecture.md` describing system design |
| 174 | +- Added `docs/cfg.md` with CFG semantics and limitations. |
| 175 | +- Added `docs/architecture.md` describing system design. |
61 | 176 |
|
62 | 177 | --- |
63 | 178 |
|
64 | | -## [1.0.0] — 2026-01-17 |
| 179 | +## [1.0.0] - 2026-01-17 |
65 | 180 |
|
66 | 181 | ### Initial release |
67 | 182 |
|
68 | | -- AST-based function clone detection |
69 | | -- Block-level clone detection (Type-3-lite) |
70 | | -- Baseline workflow for CI |
71 | | -- JSON and text reports |
| 183 | +- AST-based function clone detection. |
| 184 | +- Block-level clone detection (Type-3-lite). |
| 185 | +- Baseline workflow for CI. |
| 186 | +- JSON and text reports. |
0 commit comments