Skip to content

Latest commit

 

History

History
251 lines (192 loc) · 13.4 KB

File metadata and controls

251 lines (192 loc) · 13.4 KB

Changelog

All notable changes to this project will be documented in this file.

v0.8.3

  • Bug Fixes (identified by Gemini Deep Think v6 — clean benchmark prompt, zero historical bias)

    • Fixed output_folder auto-ignore silently excluding all user content in the folder — now only ignores *.md context files within it
    • Fixed diff_context_lines config being ignored in auto-diff mode — the value was never passed through compare_with to diff_file_contents
    • Fixed double file I/O in process_fileseek(0) was followed by fs::read_to_string which opens a new file descriptor, wasting the seek. Now reuses the already-open handle
  • Security Hardening

    • install.sh now defaults to ~/.local/bin (user-local, no sudo required) instead of /usr/local/bin
    • Supports CONTEXT_BUILDER_INSTALL_DIR env var for custom install paths
    • SKILL.md rewritten: cargo install promoted as primary method (fully verified via crates.io), -y flag guidance softened to require explicit path scoping

v0.8.2

  • Documentation

    • Updated SKILL.md for v0.8.1+ with Security & Path Scoping section
    • Documented Tree-Sitter CLI flags (--signatures, --structure, --visibility, --truncate)
    • Added AST signatures and API surface review recipes
  • Test Coverage

    • Extended unit test coverage across config.rs, file_utils.rs, state.rs, markdown.rs, and lib.rs
    • Added tests for file relevance categories, lock files, various source extensions, encoding handling, auto-diff workflows, and config hash consistency

v0.8.1

  • Bug Fixes (identified by Gemini Deep Think v6 code review — 11 confirmed bugs, 0 false positives)

    • Fixed cache hash desync — cache.rs was missing 4 fields (signatures, structure, truncate, visibility), causing stale cache hits when toggling tree-sitter flags
    • Fixed JavaScript arrow function body leak — statement_block is a child of arrow_function, not variable_declarator, causing full function bodies to leak into signature output
    • Fixed TypeScript arrow function handling — same root cause as JavaScript
    • Fixed Python decorator erasure — intercepting decorated_definition nodes now preserves @decorator lines in signatures
    • Fixed Python is_method for decorated methods — iterative 4-level parent walk replaces fragile 2-level check
    • Fixed Rust tuple struct erasure — added ordered_field_declaration_list to body kinds
    • Fixed C/C++ header file prototype extraction — added declaration node matching for .h files
    • Fixed C++ class inheritance dropped — applied byte-slicing to preserve template<> and : public Base
    • Fixed JS/TS exported arrow functions invisible — added lexical_declaration to export signature extraction
    • Added .jsx extension support for JavaScript
  • Dependency Updates

    • Updated tree-sitter core: 0.24 → 0.26
    • Updated tree-sitter-rust: 0.23 → 0.24
    • Updated tree-sitter-javascript: 0.23 → 0.25
    • Updated tree-sitter-python: 0.23 → 0.25
    • Updated tree-sitter-go: 0.23 → 0.25
    • Updated tree-sitter-c: 0.23 → 0.24

v0.8.0

  • Tree-Sitter AST Integration (feature-gated)

    • New --signatures flag: Replaces full file content with extracted function/class signatures — dramatically reduces token usage (~4K vs ~15K tokens per file)
    • New --structure flag: Appends a structural summary to each file (e.g., "6 functions, 2 structs, 1 impl block")
    • New --truncate smart mode: Prefers AST-boundary truncation when content needs truncating
    • Supports 8 languages: Rust, JavaScript, TypeScript, Python, Go, Java, C, C++
    • Install with: cargo install context-builder --features tree-sitter-all
    • Individual language features available (e.g., --features tree-sitter-rust)
  • Dependency Updates

    • Updated tree-sitter core: 0.22 → 0.24
    • Updated all grammar crates: 0.21 → 0.23
    • Migrated from deprecated language() functions to LANGUAGE constants API
  • Bug Fixes

    • Fixed config hash mismatch — cache now includes auto_diff and diff_context_lines fields, preventing stale cache hits when toggling these options
    • Fixed silent config parse failure — context-builder.toml with invalid TOML syntax now prints a warning instead of silently falling back to defaults
    • Fixed smart truncation unconditionally cutting 50% of file content — now only activates with explicit token budget
    • Fixed Windows path separators in determinism test causing CI failure
  • CI & Quality

    • Added Coveralls code coverage integration via cargo-tarpaulin
    • All 188+ tests passing across Ubuntu, macOS, and Windows

v0.7.1

  • Bug Fixes (identified by Gemini Deep Think multi-round code review)

    • Fixed content hash using absolute OS paths — now normalized to relative unix-style for cross-platform determinism
    • Fixed hash collision risk — added null byte delimiter between path and content in content hash
    • Fixed strip_prefix('+') leaving extra space in diff_only mode, corrupting indentation
    • Fixed auto_diff path bypassing --max-tokens budget entirely
    • Fixed src/tests/ files misclassified as source code instead of tests
    • Fixed sorted_paths missing cwd fallback, silently dropping files when cwd ≠ base_path
  • Auto-Ignore Common Directories

    • 19 heavy directories (node_modules, dist, build, pycache, .venv, vendor, etc.) are now excluded by default
    • Prevents million-line outputs when processing projects without a .git directory
  • Context Window Warnings

    • Shows estimated token count after every run
    • Warns when output exceeds 128K tokens with actionable CLI suggestions

v0.7.0

  • Deterministic Output

    • Replaced volatile timestamp (Processed at: <timestamp>) with a content hash (Content hash: <hex>) in the Markdown header
    • Identical project states now produce byte-for-byte identical output files, enabling LLM prompt caching
  • Context Budgeting (--max-tokens N)

    • New CLI argument --max-tokens and context-builder.toml config option to cap the output token budget
    • Files are processed until the budget is exhausted, with a <truncated> marker appended
    • Prevents API errors from excessively large contexts and reduces costs
  • Relevance-Based File Ordering

    • Files are now sorted by relevance category: config files (0) → source code (1) → tests (2) → docs/other (3)
    • Within each category, files remain alphabetically sorted
    • Helps LLMs prioritize core logic and configuration over supporting files

v0.6.1

  • Bug Fixes (identified by Gemini Deep Think code review)
    • Fixed TOCTOU race in cache writes: File::create was truncating before acquiring lock, risking data loss for concurrent readers
    • Fixed indentation destruction in diff_only mode: trim_start() was stripping all leading whitespace from added files, corrupting Python/YAML
    • Fixed UTF-8 boundary corruption: 8KB sniff buffer could split multi-byte characters, misclassifying valid UTF-8 files as binary
    • Fixed CLI flags silently overwritten: config file values were unconditionally overriding CLI arguments post-resolution
    • Removed duplicate file seek block (copy-paste error)

v0.6.0

  • Smart Defaults

    • Auto-exclude output files: the tool now automatically excludes its own generated output file, output folder, and .context-builder/ cache directory from context collection without requiring manual --ignore flags
    • Timestamped output glob patterns (e.g., docs/context_*.md) are auto-excluded when timestamped_output is enabled
    • Large-file detection: warns about files exceeding 100 KB with a sorted top-5 list and total context size summary
    • Improved project name detection: canonicalizes relative paths (like .) to resolve the actual directory name instead of showing "unknown"
  • Testing & Stability

    • Added #[serial] annotations to integration tests that mutate CWD, fixing intermittent test failures in parallel execution
    • All 146 tests pass consistently with --test-threads=1
  • Dependencies

    • Updated criterion to 0.8.2
    • Updated tiktoken-rs to 0.9.1
    • Updated toml to 1.0.1

v0.5.2

  • Enhanced --init command to detect major file types in the current directory and suggest appropriate filters instead of using generic defaults
  • Fixed file type detection to respect .gitignore patterns and common ignore directories (target, node_modules, etc.)

v0.5.1

  • Added --init command to create a new context-builder.toml configuration file in the current directory with sensible defaults

v0.5.0

  • BREAKING CHANGES

    • Cache file locations changed to project-specific paths to prevent collisions
  • Critical Bug Fixes

    • Fixed inverted ignore logic: Corrected critical bug where ignore patterns were being treated as include patterns, causing files/directories meant to be ignored to be explicitly included instead
    • Fixed cache read panics: Improved error handling for corrupted cache files to prevent application crashes
    • Fixed potential panics in path manipulation: Added safe handling for edge case filenames without extensions or stems
  • Major Improvements

    • Deterministic Output: Files are now sorted consistently, ensuring identical output for the same input across multiple runs
    • Robust Caching Architecture: Complete rewrite of caching system with:
      • Project-specific cache keys based on absolute path hash to prevent collisions
      • JSON-based structured caching replacing fragile markdown parsing
      • File locking with fs2 crate for thread-safe concurrent access
      • Configuration changes now properly invalidate cache
    • Enhanced Auto-Diff System:
      • Structured state representation before markdown generation
      • Eliminated fragile text parsing with extract_file_contents and strip_line_number functions
      • Cache structured data (JSON) instead of markdown for reliability
    • Thread Safety: Removed all unsafe blocks and explicit configuration passing replaces environment variables
  • Performance Optimizations

    • Custom Ignores: Now uses ignore::overrides::OverrideBuilder with glob pattern support for better performance
    • Parallel Processing: Improved error handling to collect all errors and continue processing other files
    • Directory Traversal: Let ignore crate optimize directory traversal instead of custom logic
  • Bug Fixes

    • Fixed non-deterministic output order that caused inconsistent LLM context generation
    • Removed incorrect triple-backtick filtering in diff logic that was corrupting file content
    • Fixed cache corruption issues in concurrent access scenarios
    • Improved error recovery for partial failures and corrupted cache
    • Fixed inconsistent file tree visualization between auto-diff and standard modes
  • Testing & Quality

    • Added comprehensive integration test suite with tests covering:
      • Determinism verification
      • Auto-diff workflows
      • Cache collision prevention
      • Configuration change detection
      • Error recovery scenarios
    • Fixed test race conditions by running tests serially in CI (--test-threads=1)
    • Added pretty_assertions for better test output
    • Fixed all clippy warnings and enforced -D warnings in CI
  • Dependencies

    • Added fs2 for file locking
    • Added serde_json for structured cache format
    • Added serial_test for test serialization
    • Added pretty_assertions for enhanced test output
    • Added encoding_rs for enhanced encoding detection and transcoding
  • Migration

    • Automatic detection and cleanup of old markdown-based cache files (last_canonical.md, etc.)
    • First run after upgrade will clear old cache format to prevent conflicts
    • CLI interface remains fully backward compatible
  • Code Quality & Maintenance

    • Fixed all clippy warnings including type complexity, collapsible if statements, and redundant closures
    • Updated CI workflow to prevent race conditions in tests
    • Improved binary file detection with better encoding strategy handling
    • Enhanced error handling for edge cases and file system operations

v0.4.0

  • Added

    • Token count mode (--token-count) now provides accurate token counts using the tiktoken-rs library.

    • Configuration file support (context-builder.toml) for project-specific settings.

    • Timestamped output versions.

    • auto_diff feature to automatically generate a diff from the latest output.

    • diff_only mode (--diff-only / diff_only = true) to output only the change summary and modified file diffs (no full file bodies) for lower token usage.

  • Removed

    • Deprecated, unpublished standalone_snapshot option (replaced by diff_only).

v0.3.0

  • Changed

    • Parallel processing is now enabled by default via the parallel feature (uses rayon) for significant speedups on large projects.
      • To build/run sequentially, disable default features:
        • CLI/build: cargo build --no-default-features or cargo run --no-default-features
        • As a dependency: default-features = false
    • Updated Rust edition to 2024.
  • Benchmarks

    • Benchmarks run silent by default by setting CB_SILENT=1 at startup to avoid skewing timings with console I/O.
      • Override with CB_SILENT=0 if you want to see output during benches.

v0.2.0

  • Added line numbers support
  • Improved file tree visualization
  • Enhanced error handling
  • Better CLI argument validation

v0.1.0

  • Initial release
  • Basic directory processing
  • File filtering and ignoring
  • Markdown output generation