Skip to content

Commit 8b34447

Browse files
IronAdamantclaude
andcommitted
v0.6.0: Pluggable extractors, batch SQL, cross-platform locks, shared reads
Four architectural improvements: - Pluggable AST extractors: register_extractor() lets users override built-in regex extractors with tree-sitter/LSP backends. Zero new dependencies. - Batch SQL queries: 5 new get_*_batch() methods eliminate N+1 pattern in risk_map computation (~5 queries instead of N*5). - Process-level read locks: all read tools acquire shared process lock for safe concurrent multi-process reads. - Cross-platform ProcessLock: fcntl.flock on Unix, LockFileEx via ctypes on Windows. Both support shared and exclusive locks. 18 new tests (540 total), all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a6e6152 commit 8b34447

15 files changed

Lines changed: 667 additions & 59 deletions

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,22 @@ All notable changes to Chisel are documented in this file.
55
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66
This project uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.6.0] - 2026-03-22
9+
10+
### Added
11+
12+
- **Pluggable AST extractors**: `register_extractor(language, fn)` lets users override built-in regex extractors with tree-sitter, LSP, or other backends. `unregister_extractor()` reverts to built-in. `get_registered_extractors()` for introspection. Custom extractors checked before built-ins in `extract_code_units()`. Zero new dependencies.
13+
- **Batch SQL queries**: 5 new `get_*_batch()` methods in `storage.py` for edges, code units, co-changes, churn stats, and blame. `_chunked()` helper splits large batches to stay under SQLite's variable limit.
14+
- **Process-level read locks**: All read tool methods in `engine.py` now acquire `_process_lock.shared()` + `lock.read_lock()`. Write tools (`record_result`, `analyze`, `update`) acquire `_process_lock.exclusive()` + `lock.write_lock()`. Concurrent reads from multiple processes are now safe.
15+
- **Cross-platform ProcessLock**: `project.py` uses `fcntl.flock` on Unix and `LockFileEx`/`UnlockFileEx` via ctypes on Windows. Both support shared and exclusive locks.
16+
- 18 new tests: extractor registry (6), batch queries (7), process lock (3), engine lock wiring (2)
17+
18+
### Changed
19+
20+
- `impact.get_risk_map()` rewritten to use batch queries — computes all risk scores in ~5 queries instead of N*5 (eliminates N+1 pattern)
21+
- `ProcessLock._acquire()` takes `exclusive: bool` instead of a platform-specific lock type constant
22+
- 540 tests pass (up from 522)
23+
824
## [0.5.4] - 2026-03-22
925

1026
### Fixed

CLAUDE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,16 @@ chisel/
3131
- **Blame caching**: Cached by file content hash, invalidated on change.
3232
- **Incremental updates**: File content hashes tracked in `file_hashes` table.
3333
- **Persistent connection**: Storage uses a single SQLite connection (`check_same_thread=False`) with RWLock for thread safety.
34-
- **Multi-agent safety**: `project.py` provides: (1) `detect_project_root()` canonicalizes via git common dir so worktrees share identity, (2) `normalize_path()` ensures consistent relative paths, (3) `resolve_storage_dir()` defaults to project-local `.chisel/` (priority: explicit > env > project-local > ~/.chisel/), (4) `ProcessLock` uses `fcntl.flock` for cross-process write coordination.
34+
- **Multi-agent safety**: `project.py` provides: (1) `detect_project_root()` canonicalizes via git common dir so worktrees share identity, (2) `normalize_path()` ensures consistent relative paths, (3) `resolve_storage_dir()` defaults to project-local `.chisel/` (priority: explicit > env > project-local > ~/.chisel/), (4) `ProcessLock` for cross-process coordination — shared locks for reads, exclusive for writes. Cross-platform: `fcntl.flock` on Unix, `LockFileEx` on Windows.
3535
- **SQLite concurrency**: 30s `busy_timeout` + exponential-backoff retry on `_execute` for cross-process SQLITE_BUSY.
3636
- **Ownership vs Reviewers**: `ownership` = blame-based (who wrote the code, `role: "original_author"`). `who_reviews` = commit-activity-based (who maintains it, `role: "suggested_reviewer"`).
3737
- **Shared constants**: `_SKIP_DIRS` and `_EXTENSION_MAP` live in `ast_utils.py`. `_CODE_EXTENSIONS` in `engine.py` is derived from `_EXTENSION_MAP`.
3838
- **Shared dispatch**: `dispatch_tool()` in `mcp_server.py` is used by both HTTP and stdio servers. Tool schemas and dispatch tables live in `schemas.py`.
3939
- **Edge weighting**: Test edges carry a weight (0.4-1.0) based on file proximity. Python import-path matching (`from myapp.utils import foo``myapp/utils.py:foo`) takes priority over name-only matching. `_compute_proximity_weight()` and `_matches_import_path()` in `test_mapper.py`.
4040
- **AST regex improvements**: C#/Java support nested generics `<A<B>>` and annotations/attributes `@Override`/`[Test]`. Kotlin supports extension functions `fun String.foo()`. C++ supports template functions and destructors `~Foo()`. Swift supports `@objc`-style attributes. Dart supports factory constructors and getters/setters.
41+
- **Pluggable extractors**: `register_extractor(lang, fn)` in `ast_utils.py` lets users override built-in regex extractors with tree-sitter or LSP-backed ones. `_custom_extractors` checked before `_EXTRACTORS` in `extract_code_units()`. Zero-dep — the registry is just hooks.
42+
- **Batch SQL queries**: `storage.py` provides `get_*_batch()` methods for edges, code units, co-changes, churn, and blame. `impact.get_risk_map()` uses these to compute all risk scores in ~5 queries total instead of N*5. `_chunked()` helper splits large batches to stay under SQLite's variable limit.
43+
- **Process-level read locks**: All read tool methods in `engine.py` acquire `_process_lock.shared()` (outer) + `lock.read_lock()` (inner). Writes acquire `_process_lock.exclusive()` + `lock.write_lock()`. This allows concurrent reads from multiple processes while blocking during writes.
4144

4245
## Dev Commands
4346

COMPLETE_PROJECT_DOCUMENTATION.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Test impact analysis and code intelligence for LLM agents. Zero external dependencies.
44

5-
**Version:** 0.5.4
5+
**Version:** 0.6.0
66
**PyPI:** `chisel-test-impact`
77
**License:** MIT
88
**Python:** >= 3.9
@@ -35,7 +35,7 @@ Test impact analysis and code intelligence for LLM agents. Zero external depende
3535
| `chisel/metrics.py` | Pure computation: churn scoring, ownership aggregation, co-change detection | `collections`, `datetime`, `itertools` | [glossary: churn score](wiki-local/glossary.md) |
3636
| `chisel/test_mapper.py` | Test file discovery, framework detection (pytest/Jest/Go/Rust/Playwright), dependency extraction, test edge building | `ast`, `os`, `re`, `pathlib`, `chisel.ast_utils`, `chisel.project` | [glossary: test edge](wiki-local/glossary.md) |
3737
| `chisel/impact.py` | Impact analysis, risk scoring, stale test detection, ownership queries, reviewer suggestions | `collections`, `datetime`, `chisel.metrics`, `chisel.storage` (via constructor injection) | [glossary: risk score](wiki-local/glossary.md) |
38-
| `chisel/project.py` | Multi-agent safety: project root detection (worktree-aware), path normalization, storage dir resolution, cross-process file lock (ProcessLock) | `fcntl`, `os`, `subprocess`, `contextlib` | -- |
38+
| `chisel/project.py` | Multi-agent safety: project root detection (worktree-aware), path normalization, storage dir resolution, cross-platform file lock (ProcessLock) | `os`, `subprocess`, `sys`, `contextlib`; Unix: `fcntl`; Windows: `ctypes`, `msvcrt` | -- |
3939
| `chisel/engine.py` | Orchestrator -- owns Storage, GitAnalyzer, TestMapper, ImpactAnalyzer, RWLock, ProcessLock; exposes `tool_*()` methods for all 15 MCP tools | `os`, `chisel.ast_utils`, `chisel.git_analyzer`, `chisel.impact`, `chisel.project`, `chisel.rwlock`, `chisel.storage`, `chisel.test_mapper` | [spec-project](wiki-local/spec-project.md) |
4040
| `chisel/cli.py` | argparse CLI with 17 subcommands, dispatch table, output formatting | `argparse`, `json`, `os`, `chisel.engine` | [spec-project: CLI](wiki-local/spec-project.md) |
4141
| `chisel/mcp_server.py` | HTTP MCP server (GET /tools, /health; POST /call), ThreadedHTTPServer, tool schemas and dispatch table | `json`, `logging`, `threading`, `http.server`, `socketserver`, `chisel.engine` | [spec-project: MCP tools](wiki-local/spec-project.md) |

LLM_Development.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,43 @@ Chronological record of development activity on the Chisel project.
44

55
---
66

7+
## v0.6.0 -- 2026-03-22 -- Pluggable Extractors, Batch Queries, Cross-Platform Locks
8+
9+
### Summary
10+
Four architectural improvements: pluggable AST extraction for tree-sitter/LSP integration, batch SQL to eliminate N+1 in risk_map, process-level shared locks for concurrent reads, cross-platform ProcessLock (Windows support via LockFileEx).
11+
12+
### Pluggable AST Extraction (ast_utils.py)
13+
- `register_extractor(language, fn)` stores custom extractors in `_custom_extractors` dict
14+
- `extract_code_units()` checks custom extractors first, falls back to built-in regex
15+
- `unregister_extractor(language)` reverts to built-in (raises KeyError if not registered)
16+
- `get_registered_extractors()` returns shallow copy for introspection
17+
- Zero new dependencies — registry is just callable hooks
18+
19+
### Batch SQL Queries (storage.py, impact.py)
20+
- 5 new batch methods: `get_edges_for_code_batch`, `get_code_units_by_files_batch`, `get_co_changes_batch`, `get_churn_stats_batch`, `get_blame_batch`
21+
- `_chunked()` helper splits lists into chunks of 900 to stay under SQLite's 999-variable limit
22+
- `impact.get_risk_map()` rewritten to use batch queries — ~5 total queries instead of N*5
23+
- `compute_risk_score()` unchanged for single-file use
24+
25+
### Process-Level Read Locks (engine.py)
26+
- All 12 read tool methods now acquire `_process_lock.shared()` (outer) + `lock.read_lock()` (inner)
27+
- `tool_record_result` now acquires `_process_lock.exclusive()` + `lock.write_lock()`
28+
- `analyze()` and `update()` already used exclusive locks — no change
29+
- Lock nesting order: process lock (outer) → RWLock (inner) — always consistent
30+
31+
### Cross-Platform ProcessLock (project.py)
32+
- Module-level `_IS_WINDOWS = sys.platform == "win32"` for platform detection
33+
- Unix: `fcntl.flock` (unchanged behavior)
34+
- Windows: `ctypes` calls to `kernel32.LockFileEx`/`UnlockFileEx` — supports both shared and exclusive locks
35+
- `_flock(fd, exclusive)` and `_funlock(fd)` are platform-neutral module functions
36+
- `ProcessLock._acquire(exclusive: bool)` replaces platform-specific lock type constants
37+
38+
### Tests
39+
- 18 new tests: extractor registry (6), batch queries (7), cross-platform lock (3), engine lock wiring (2)
40+
- 540 tests total, all passing
41+
42+
---
43+
744
## v0.5.4 -- 2026-03-22 -- Codebase Audit: Simplify, Modernize, Harden
845

946
### Summary

chisel/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.5.4"
1+
__version__ = "0.6.0"

chisel/ast_utils.py

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,48 @@ def _name_kind(m):
543543
]
544544

545545

546+
# ---------------------------------------------------------------------------
547+
# Custom extractor registry (plugin system)
548+
# ---------------------------------------------------------------------------
549+
550+
_custom_extractors: dict[str, object] = {}
551+
552+
553+
def register_extractor(language, extractor):
554+
"""Register a custom code unit extractor for a language.
555+
556+
Custom extractors override the built-in regex-based ones, allowing
557+
tree-sitter, LSP, or other backends without adding dependencies to
558+
Chisel itself.
559+
560+
Args:
561+
language: Language string (e.g. "python", "rust"). Must match a
562+
key in ``_EXTENSION_MAP`` or a custom extension mapping.
563+
extractor: Callable with signature
564+
``(file_path: str, content: str) -> list[CodeUnit]``.
565+
566+
Raises:
567+
TypeError: If *extractor* is not callable.
568+
"""
569+
if not callable(extractor):
570+
raise TypeError(f"extractor must be callable, got {type(extractor).__name__}")
571+
_custom_extractors[language] = extractor
572+
573+
574+
def unregister_extractor(language):
575+
"""Remove a custom extractor, reverting to the built-in one.
576+
577+
Raises:
578+
KeyError: If no custom extractor is registered for *language*.
579+
"""
580+
del _custom_extractors[language]
581+
582+
583+
def get_registered_extractors():
584+
"""Return a shallow copy of the custom extractor registry."""
585+
return dict(_custom_extractors)
586+
587+
546588
# ---------------------------------------------------------------------------
547589
# Dispatcher
548590
# ---------------------------------------------------------------------------
@@ -568,10 +610,12 @@ def _name_kind(m):
568610
def extract_code_units(file_path: str, content: str) -> list[CodeUnit]:
569611
"""Extract code units from *content* using the appropriate language parser.
570612
571-
Dispatches to a language-specific extractor based on the file extension.
613+
Custom extractors registered via :func:`register_extractor` take
614+
priority over built-in ones. Dispatches based on the file extension.
572615
Returns an empty list for unsupported languages.
573616
"""
574617
lang = detect_language(file_path)
575-
if lang not in _EXTRACTORS:
618+
extractor = _custom_extractors.get(lang) or _EXTRACTORS.get(lang)
619+
if extractor is None:
576620
return []
577-
return _EXTRACTORS[lang](file_path, content)
621+
return extractor(file_path, content)

chisel/engine.py

Lines changed: 47 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -151,54 +151,62 @@ def tool_analyze(self, directory=None, force=False):
151151

152152
def tool_impact(self, files, functions=None):
153153
"""MCP tool: get impacted tests for changed files."""
154-
with self.lock.read_lock():
155-
return self.impact.get_impacted_tests(files, functions)
154+
with self._process_lock.shared():
155+
with self.lock.read_lock():
156+
return self.impact.get_impacted_tests(files, functions)
156157

157158
def tool_suggest_tests(self, file_path):
158159
"""MCP tool: suggest tests for a file."""
159-
with self.lock.read_lock():
160-
return self.impact.suggest_tests(file_path)
160+
with self._process_lock.shared():
161+
with self.lock.read_lock():
162+
return self.impact.suggest_tests(file_path)
161163

162164
def tool_churn(self, file_path, unit_name=None):
163165
"""MCP tool: get churn stats. Always returns a list."""
164-
with self.lock.read_lock():
165-
stat = self.storage.get_churn_stat(file_path, unit_name)
166-
if stat:
167-
return [stat]
168-
# Only fall back to all stats when no specific unit was requested
169-
if unit_name is None:
170-
return self.storage.get_all_churn_stats(file_path)
171-
return []
166+
with self._process_lock.shared():
167+
with self.lock.read_lock():
168+
stat = self.storage.get_churn_stat(file_path, unit_name)
169+
if stat:
170+
return [stat]
171+
if unit_name is None:
172+
return self.storage.get_all_churn_stats(file_path)
173+
return []
172174

173175
def tool_ownership(self, file_path):
174176
"""MCP tool: get blame-based code ownership."""
175-
with self.lock.read_lock():
176-
return self.impact.get_ownership(file_path)
177+
with self._process_lock.shared():
178+
with self.lock.read_lock():
179+
return self.impact.get_ownership(file_path)
177180

178181
def tool_coupling(self, file_path, min_count=3):
179182
"""MCP tool: get co-change coupling partners."""
180-
with self.lock.read_lock():
181-
return self.storage.get_co_changes(file_path, min_count)
183+
with self._process_lock.shared():
184+
with self.lock.read_lock():
185+
return self.storage.get_co_changes(file_path, min_count)
182186

183187
def tool_risk_map(self, directory=None):
184188
"""MCP tool: risk scores for all files."""
185-
with self.lock.read_lock():
186-
return self.impact.get_risk_map(directory)
189+
with self._process_lock.shared():
190+
with self.lock.read_lock():
191+
return self.impact.get_risk_map(directory)
187192

188193
def tool_stale_tests(self):
189194
"""MCP tool: detect stale tests."""
190-
with self.lock.read_lock():
191-
return self.impact.detect_stale_tests()
195+
with self._process_lock.shared():
196+
with self.lock.read_lock():
197+
return self.impact.detect_stale_tests()
192198

193199
def tool_history(self, file_path):
194200
"""MCP tool: commit history for a file."""
195-
with self.lock.read_lock():
196-
return self.storage.get_commits_for_file(file_path)
201+
with self._process_lock.shared():
202+
with self.lock.read_lock():
203+
return self.storage.get_commits_for_file(file_path)
197204

198205
def tool_who_reviews(self, file_path):
199206
"""MCP tool: suggest reviewers based on recent commit activity."""
200-
with self.lock.read_lock():
201-
return self.impact.suggest_reviewers(file_path)
207+
with self._process_lock.shared():
208+
with self.lock.read_lock():
209+
return self.impact.suggest_reviewers(file_path)
202210

203211
def tool_diff_impact(self, ref=None):
204212
"""MCP tool: auto-detect changes from git diff and return impacted tests.
@@ -217,30 +225,34 @@ def tool_diff_impact(self, ref=None):
217225
functions.extend(self.git.get_changed_functions(fp, ref))
218226
except RuntimeError:
219227
pass
220-
with self.lock.read_lock():
221-
return self.impact.get_impacted_tests(
222-
changed_files, functions or None,
223-
)
228+
with self._process_lock.shared():
229+
with self.lock.read_lock():
230+
return self.impact.get_impacted_tests(
231+
changed_files, functions or None,
232+
)
224233

225234
def tool_update(self):
226235
"""MCP tool: incremental re-analysis of changed files."""
227236
return self.update()
228237

229238
def tool_test_gaps(self, file_path=None, directory=None, exclude_tests=True):
230239
"""MCP tool: find code units with no test coverage."""
231-
with self.lock.read_lock():
232-
return self.impact.get_test_gaps(file_path, directory, exclude_tests)
240+
with self._process_lock.shared():
241+
with self.lock.read_lock():
242+
return self.impact.get_test_gaps(file_path, directory, exclude_tests)
233243

234244
def tool_record_result(self, test_id, passed, duration_ms=None):
235245
"""MCP tool: record a test result (pass/fail) for future prioritization."""
236-
with self.lock.write_lock():
237-
self.storage.record_test_result(test_id, passed, duration_ms)
238-
return {"test_id": test_id, "passed": passed, "recorded": True}
246+
with self._process_lock.exclusive():
247+
with self.lock.write_lock():
248+
self.storage.record_test_result(test_id, passed, duration_ms)
249+
return {"test_id": test_id, "passed": passed, "recorded": True}
239250

240251
def tool_stats(self):
241252
"""MCP tool: get summary counts for the Chisel database."""
242-
with self.lock.read_lock():
243-
return self.storage.get_stats()
253+
with self._process_lock.shared():
254+
with self.lock.read_lock():
255+
return self.storage.get_stats()
244256

245257
# ------------------------------------------------------------------ #
246258
# Shared internal helpers

0 commit comments

Comments
 (0)