All notable changes to this project will be documented in this file.
-
Bug Fixes (identified by Gemini Deep Think v6 — clean benchmark prompt, zero historical bias)
- Fixed
output_folderauto-ignore silently excluding all user content in the folder — now only ignores*.mdcontext files within it - Fixed
diff_context_linesconfig being ignored in auto-diff mode — the value was never passed throughcompare_withtodiff_file_contents - Fixed double file I/O in
process_file—seek(0)was followed byfs::read_to_stringwhich opens a new file descriptor, wasting the seek. Now reuses the already-open handle
- Fixed
-
Security Hardening
install.shnow defaults to~/.local/bin(user-local, no sudo required) instead of/usr/local/bin- Supports
CONTEXT_BUILDER_INSTALL_DIRenv var for custom install paths - SKILL.md rewritten:
cargo installpromoted as primary method (fully verified via crates.io),-yflag guidance softened to require explicit path scoping
-
Documentation
- Updated SKILL.md for v0.8.1+ with Security & Path Scoping section
- Documented Tree-Sitter CLI flags (
--signatures,--structure,--visibility,--truncate) - Added AST signatures and API surface review recipes
-
Test Coverage
- Extended unit test coverage across
config.rs,file_utils.rs,state.rs,markdown.rs, andlib.rs - Added tests for file relevance categories, lock files, various source extensions, encoding handling, auto-diff workflows, and config hash consistency
- Extended unit test coverage across
-
Bug Fixes (identified by Gemini Deep Think v6 code review — 11 confirmed bugs, 0 false positives)
- Fixed cache hash desync —
cache.rswas missing 4 fields (signatures,structure,truncate,visibility), causing stale cache hits when toggling tree-sitter flags - Fixed JavaScript arrow function body leak —
statement_blockis a child ofarrow_function, notvariable_declarator, causing full function bodies to leak into signature output - Fixed TypeScript arrow function handling — same root cause as JavaScript
- Fixed Python decorator erasure — intercepting
decorated_definitionnodes now preserves@decoratorlines in signatures - Fixed Python
is_methodfor decorated methods — iterative 4-level parent walk replaces fragile 2-level check - Fixed Rust tuple struct erasure — added
ordered_field_declaration_listto body kinds - Fixed C/C++ header file prototype extraction — added
declarationnode matching for.hfiles - Fixed C++ class inheritance dropped — applied byte-slicing to preserve
template<>and: public Base - Fixed JS/TS exported arrow functions invisible — added
lexical_declarationto export signature extraction - Added
.jsxextension support for JavaScript
- Fixed cache hash desync —
-
Dependency Updates
- Updated
tree-sittercore: 0.24 → 0.26 - Updated
tree-sitter-rust: 0.23 → 0.24 - Updated
tree-sitter-javascript: 0.23 → 0.25 - Updated
tree-sitter-python: 0.23 → 0.25 - Updated
tree-sitter-go: 0.23 → 0.25 - Updated
tree-sitter-c: 0.23 → 0.24
- Updated
-
Tree-Sitter AST Integration (feature-gated)
- New
--signaturesflag: Replaces full file content with extracted function/class signatures — dramatically reduces token usage (~4K vs ~15K tokens per file) - New
--structureflag: Appends a structural summary to each file (e.g., "6 functions, 2 structs, 1 impl block") - New
--truncate smartmode: Prefers AST-boundary truncation when content needs truncating - Supports 8 languages: Rust, JavaScript, TypeScript, Python, Go, Java, C, C++
- Install with:
cargo install context-builder --features tree-sitter-all - Individual language features available (e.g.,
--features tree-sitter-rust)
- New
-
Dependency Updates
- Updated
tree-sittercore: 0.22 → 0.24 - Updated all grammar crates: 0.21 → 0.23
- Migrated from deprecated
language()functions toLANGUAGEconstants API
- Updated
-
Bug Fixes
- Fixed config hash mismatch — cache now includes
auto_diffanddiff_context_linesfields, preventing stale cache hits when toggling these options - Fixed silent config parse failure —
context-builder.tomlwith invalid TOML syntax now prints a warning instead of silently falling back to defaults - Fixed smart truncation unconditionally cutting 50% of file content — now only activates with explicit token budget
- Fixed Windows path separators in determinism test causing CI failure
- Fixed config hash mismatch — cache now includes
-
CI & Quality
- Added Coveralls code coverage integration via
cargo-tarpaulin - All 188+ tests passing across Ubuntu, macOS, and Windows
- Added Coveralls code coverage integration via
-
Bug Fixes (identified by Gemini Deep Think multi-round code review)
- Fixed content hash using absolute OS paths — now normalized to relative unix-style for cross-platform determinism
- Fixed hash collision risk — added null byte delimiter between path and content in content hash
- Fixed
strip_prefix('+')leaving extra space in diff_only mode, corrupting indentation - Fixed auto_diff path bypassing
--max-tokensbudget entirely - Fixed
src/tests/files misclassified as source code instead of tests - Fixed
sorted_pathsmissing cwd fallback, silently dropping files when cwd ≠ base_path
-
Auto-Ignore Common Directories
- 19 heavy directories (node_modules, dist, build, pycache, .venv, vendor, etc.) are now excluded by default
- Prevents million-line outputs when processing projects without a
.gitdirectory
-
Context Window Warnings
- Shows estimated token count after every run
- Warns when output exceeds 128K tokens with actionable CLI suggestions
-
Deterministic Output
- Replaced volatile timestamp (
Processed at: <timestamp>) with a content hash (Content hash: <hex>) in the Markdown header - Identical project states now produce byte-for-byte identical output files, enabling LLM prompt caching
- Replaced volatile timestamp (
-
Context Budgeting (
--max-tokens N)- New CLI argument
--max-tokensandcontext-builder.tomlconfig option to cap the output token budget - Files are processed until the budget is exhausted, with a
<truncated>marker appended - Prevents API errors from excessively large contexts and reduces costs
- New CLI argument
-
Relevance-Based File Ordering
- Files are now sorted by relevance category: config files (0) → source code (1) → tests (2) → docs/other (3)
- Within each category, files remain alphabetically sorted
- Helps LLMs prioritize core logic and configuration over supporting files
- Bug Fixes (identified by Gemini Deep Think code review)
- Fixed TOCTOU race in cache writes:
File::createwas truncating before acquiring lock, risking data loss for concurrent readers - Fixed indentation destruction in
diff_onlymode:trim_start()was stripping all leading whitespace from added files, corrupting Python/YAML - Fixed UTF-8 boundary corruption: 8KB sniff buffer could split multi-byte characters, misclassifying valid UTF-8 files as binary
- Fixed CLI flags silently overwritten: config file values were unconditionally overriding CLI arguments post-resolution
- Removed duplicate file seek block (copy-paste error)
- Fixed TOCTOU race in cache writes:
-
Smart Defaults
- Auto-exclude output files: the tool now automatically excludes its own generated output file, output folder, and
.context-builder/cache directory from context collection without requiring manual--ignoreflags - Timestamped output glob patterns (e.g.,
docs/context_*.md) are auto-excluded whentimestamped_outputis enabled - Large-file detection: warns about files exceeding 100 KB with a sorted top-5 list and total context size summary
- Improved project name detection: canonicalizes relative paths (like
.) to resolve the actual directory name instead of showing "unknown"
- Auto-exclude output files: the tool now automatically excludes its own generated output file, output folder, and
-
Testing & Stability
- Added
#[serial]annotations to integration tests that mutate CWD, fixing intermittent test failures in parallel execution - All 146 tests pass consistently with
--test-threads=1
- Added
-
Dependencies
- Updated
criterionto 0.8.2 - Updated
tiktoken-rsto 0.9.1 - Updated
tomlto 1.0.1
- Updated
- Enhanced
--initcommand to detect major file types in the current directory and suggest appropriate filters instead of using generic defaults - Fixed file type detection to respect .gitignore patterns and common ignore directories (target, node_modules, etc.)
- Added
--initcommand to create a newcontext-builder.tomlconfiguration file in the current directory with sensible defaults
-
BREAKING CHANGES
- Cache file locations changed to project-specific paths to prevent collisions
-
Critical Bug Fixes
- Fixed inverted ignore logic: Corrected critical bug where ignore patterns were being treated as include patterns, causing files/directories meant to be ignored to be explicitly included instead
- Fixed cache read panics: Improved error handling for corrupted cache files to prevent application crashes
- Fixed potential panics in path manipulation: Added safe handling for edge case filenames without extensions or stems
-
Major Improvements
- Deterministic Output: Files are now sorted consistently, ensuring identical output for the same input across multiple runs
- Robust Caching Architecture: Complete rewrite of caching system with:
- Project-specific cache keys based on absolute path hash to prevent collisions
- JSON-based structured caching replacing fragile markdown parsing
- File locking with
fs2crate for thread-safe concurrent access - Configuration changes now properly invalidate cache
- Enhanced Auto-Diff System:
- Structured state representation before markdown generation
- Eliminated fragile text parsing with
extract_file_contentsandstrip_line_numberfunctions - Cache structured data (JSON) instead of markdown for reliability
- Thread Safety: Removed all
unsafeblocks and explicit configuration passing replaces environment variables
-
Performance Optimizations
- Custom Ignores: Now uses
ignore::overrides::OverrideBuilderwith glob pattern support for better performance - Parallel Processing: Improved error handling to collect all errors and continue processing other files
- Directory Traversal: Let
ignorecrate optimize directory traversal instead of custom logic
- Custom Ignores: Now uses
-
Bug Fixes
- Fixed non-deterministic output order that caused inconsistent LLM context generation
- Removed incorrect triple-backtick filtering in diff logic that was corrupting file content
- Fixed cache corruption issues in concurrent access scenarios
- Improved error recovery for partial failures and corrupted cache
- Fixed inconsistent file tree visualization between auto-diff and standard modes
-
Testing & Quality
- Added comprehensive integration test suite with tests covering:
- Determinism verification
- Auto-diff workflows
- Cache collision prevention
- Configuration change detection
- Error recovery scenarios
- Fixed test race conditions by running tests serially in CI (
--test-threads=1) - Added
pretty_assertionsfor better test output - Fixed all clippy warnings and enforced
-D warningsin CI
- Added comprehensive integration test suite with tests covering:
-
Dependencies
- Added
fs2for file locking - Added
serde_jsonfor structured cache format - Added
serial_testfor test serialization - Added
pretty_assertionsfor enhanced test output - Added
encoding_rsfor enhanced encoding detection and transcoding
- Added
-
Migration
- Automatic detection and cleanup of old markdown-based cache files (
last_canonical.md, etc.) - First run after upgrade will clear old cache format to prevent conflicts
- CLI interface remains fully backward compatible
- Automatic detection and cleanup of old markdown-based cache files (
-
Code Quality & Maintenance
- Fixed all clippy warnings including type complexity, collapsible if statements, and redundant closures
- Updated CI workflow to prevent race conditions in tests
- Improved binary file detection with better encoding strategy handling
- Enhanced error handling for edge cases and file system operations
-
Added
-
Token count mode (
--token-count) now provides accurate token counts using thetiktoken-rslibrary. -
Configuration file support (
context-builder.toml) for project-specific settings. -
Timestamped output versions.
-
auto_difffeature to automatically generate a diff from the latest output. -
diff_onlymode (--diff-only/diff_only = true) to output only the change summary and modified file diffs (no full file bodies) for lower token usage.
-
-
Removed
- Deprecated, unpublished
standalone_snapshotoption (replaced bydiff_only).
- Deprecated, unpublished
-
Changed
- Parallel processing is now enabled by default via the
parallelfeature (usesrayon) for significant speedups on large projects.- To build/run sequentially, disable default features:
- CLI/build:
cargo build --no-default-featuresorcargo run --no-default-features - As a dependency:
default-features = false
- CLI/build:
- To build/run sequentially, disable default features:
- Updated Rust edition to 2024.
- Parallel processing is now enabled by default via the
-
Benchmarks
- Benchmarks run silent by default by setting
CB_SILENT=1at startup to avoid skewing timings with console I/O.- Override with
CB_SILENT=0if you want to see output during benches.
- Override with
- Benchmarks run silent by default by setting
- Added line numbers support
- Improved file tree visualization
- Enhanced error handling
- Better CLI argument validation
- Initial release
- Basic directory processing
- File filtering and ignoring
- Markdown output generation