AI Agent Codebase Intelligence CLI
Better Context transforms unstructured codebases into structured, AI-consumable context using graph theory and fast primitives. It provides data primitives that AI agents can query on-demand, along with deep analysis tools for dependency graphs, centrality metrics, and token-optimized context selection.
# Run directly with uv (no install required)
uvx better-context overview
uvx better-context tree --depth 2
# Or install from PyPI
pip install better-context
better-context overview
# Deep analysis (requires indexing first)
better-context scan
better-context stats
better-context focus src/main.pyThe fast primitives (overview, tree, scripts, entries, file, deps) return structured data in ~50-200ms without requiring a full codebase scan. For deeper analysis like PageRank centrality and dependency graphs, run scan first.
Traditional approaches to giving AI agents codebase context fall short:
| Approach | Problem |
|---|---|
| Dump entire files | Overwhelms context windows |
| Grep-based discovery | Misses relationships |
| Flat documentation | Lacks navigation structure |
Better Context solves this with a two-tier approach:
| Primitive | Purpose | Target Time |
|---|---|---|
overview |
Project type, framework, package manager | ~100ms |
tree |
Directory structure with file counts | ~50ms |
scripts |
Available commands from package files | ~50ms |
entries |
Entry point detection (CLI, main, server) | ~50ms |
file <path> |
Single file metadata, chunks, imports, exports | ~200ms |
deps <path> |
Dependencies and dependents for a file | ~100ms |
- PageRank centrality for mathematical file importance ranking
- Dependency graph analysis with cycle detection
- Coupling metrics (Ca/Ce/I/A/D) for architectural health
- Architecture layer detection with violation reporting
- Call graph analysis at the function level
- Token budget optimization for precise context selection
- Focus mode for ego-centric context around a specific file
- Semantic anchors for refactor-stable code references
- Primal Scan: Fast discovery of project structure and capabilities
- On-Demand Parsing: Parse only what's needed, when it's needed
- Graph Analysis: Build dependency graphs for deep understanding
- Context Optimization: Fit the most relevant code into limited token windows
- Format Flexibility: Output as JSON (for tools), Markdown (for LLMs), or Human (for people)
No installation required! Just use uvx to run directly:
# Run any command directly
uvx better-context overview
uvx better-context tree --depth 2
uvx better-context scan
# Run from git before PyPI release
uvx --from "git+https://github.com/better-context/better-context.git" better-context overview
# With optional dependencies (tree-sitter for enhanced parsing)
uvx --from "better-context[full]" better-context overviewpip install better-contextuv add better-context# Full installation with tree-sitter, rich CLI, typer
pip install "better-context[full]"git clone https://github.com/better-context/better-context
cd better-context
pip install -e ".[dev]"These commands return structured data without requiring a manifest. Default output is JSON; use --format human or --format markdown for alternative formats.
| Command | Description |
|---|---|
better-context overview |
Project metadata (language, framework, package manager) |
better-context tree |
Directory structure with file counts |
better-context scripts |
Runnable scripts from package files (npm, poetry, make) |
better-context entries |
Entry points (CLI commands, main scripts, servers) |
better-context file <path> |
File metadata, chunks, imports, and exports |
better-context deps <path> |
Dependencies and dependents for a file |
These commands require running better-context scan first to build the manifest.
| Command | Description |
|---|---|
better-context scan [path] |
Index codebase and generate manifest |
better-context stats |
Codebase statistics with PageRank centrality |
better-context graph |
Export dependency graph (Mermaid, DOT, JSON) |
better-context focus <file> |
Ego-centric context centered on a file |
better-context optimize |
Select optimal context within token budget |
better-context verify |
Check if manifest is stale |
better-context clean |
Remove generated files and caches |
# Fast primitives (no scan required)
better-context overview # Project metadata as JSON
better-context overview --format human # Human-readable output
better-context tree --depth 3 # Directory tree, 3 levels deep
better-context scripts --format markdown # Scripts as markdown table
better-context entries # Find entry points
better-context file src/auth/jwt.py # Analyze single file
better-context file src/api/routes.ts --format human
# Deep analysis (run scan first)
better-context scan # Index codebase
better-context scan --out manifest.json # Custom output path
better-context stats # PageRank-ranked files
better-context stats --json # Stats as JSON
# Dependency graph export
better-context graph -f mermaid > deps.md # Mermaid diagram
better-context graph -f dot > deps.dot # Graphviz DOT
better-context graph -f json > deps.json # JSON for custom tools
# Focus mode
better-context focus src/auth/jwt.py # Context around a file
better-context focus src/auth/jwt.py --depth 2 # Limit exploration depth
better-context focus src/auth/jwt.py --json # Output as JSON
# Token budget optimization
better-context optimize --budget 8000 # Select best context
better-context optimize -b 4000 -k auth user # Boost relevance by keywords
better-context optimize -b 8000 --task "fix auth bug"
# Maintenance
better-context verify # Check if manifest is stale
better-context clean # Remove all generated files
better-context clean --cache-only # Keep manifest, remove cache| Option | Description |
|---|---|
--root PATH |
Project root directory (default: current) |
--config PATH |
Path to .ctx.json config file |
-v, --verbose |
Increase verbosity (-v, -vv, -vvv) |
--no-color |
Disable colored output |
--version |
Show version |
Create a .ctx.json file in your project root to customize behavior:
{
"max_file_size_kb": 500,
"chunk_max_lines": 150,
"chunk_min_lines": 10,
"pagerank_damping": 0.85,
"pagerank_iterations": 20,
"output_dir": ".better-context",
"generate_agents_md": true,
"language_overrides": {
".h": "cpp",
".m": "objc"
}
}| Option | Default | Description |
|---|---|---|
max_file_size_kb |
500 | Skip files larger than this |
chunk_max_lines |
150 | Maximum lines per code chunk |
chunk_min_lines |
10 | Minimum lines to form a chunk |
pagerank_damping |
0.85 | PageRank damping factor (0-1) |
pagerank_iterations |
20 | PageRank convergence iterations |
output_dir |
.better-context | Directory for manifest output |
generate_agents_md |
true | Whether to generate AGENTS.md |
language_overrides |
{} | Map extensions to languages |
Create a .ctxignore file (gitignore-like syntax) to exclude files:
# Dependencies (ignored by default, but you can customize)
node_modules/
vendor/
# Large generated files
*.bundle.js
*.min.js
*.map
# Project-specific exclusions
legacy/
docs/generated/
# But include important fixtures
!fixtures/critical/These patterns are always ignored (you don't need to specify them):
- Version control:
.git/,.svn/,.hg/ - Dependencies:
node_modules/,vendor/,venv/,__pycache__/ - Build outputs:
dist/,build/,target/,.next/ - IDE files:
.idea/,.vscode/,*.swp - Lock files:
package-lock.json,yarn.lock,poetry.lock - Our output:
.better-context/
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Scanner │───▶│ Parser │───▶│ Graph │───▶│ Generator │
│ │ │ │ │ Analysis │ │ │
│ • Walk tree │ │ • Chunks │ │ • PageRank │ │ • Templates │
│ • Binary │ │ • Imports │ │ • Cycles │ │ • AGENTS.md │
│ detect │ │ • Exports │ │ • Layers │ │ hierarchy │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
The scanner walks your codebase and discovers files:
- Detects binary files via extension check (O(1)) and null-byte detection
- Applies ignore patterns (.ctxignore + defaults)
- Computes content hashes for caching
- Detects programming language from extensions
Each file is parsed using language-specific adapters:
- Regex mode (zero dependencies): Pattern matching for function/class boundaries
- AST mode (with tree-sitter): Full syntax tree parsing for accuracy
Currently supported languages:
- Python (.py, .pyi, .pyw)
- TypeScript (.ts, .tsx)
- JavaScript (.js, .jsx, .mjs, .cjs)
- Go (.go) - coming soon
Build and analyze the dependency graph:
- Dependency Graph: Directed graph where edges represent imports
- PageRank Centrality: Ranks files by structural importance
- Files imported by many important files rank higher
- Based on Google's original algorithm (damping factor 0.85)
- Cycle Detection: Tarjan's SCC algorithm finds circular dependencies
- Topological Layers: Kahn's algorithm assigns files to dependency layers
Generate hierarchical AGENTS.md files:
project/
├── AGENTS.md # Project overview, architecture
├── src/
│ ├── AGENTS.md # src/ module overview
│ └── api/
│ └── AGENTS.md # API module detail
Each AGENTS.md contains:
- Purpose: What this module does
- Key Files: Ranked by centrality with descriptions
- Public API: Exported symbols with signatures
- Dependencies: Internal and external imports
- Circular Dependencies: Warnings if detected
- Navigation: Links to parent/child modules
# my-project
> Auto-generated context for AI agents. Last updated: 2026-01-24T10:30:00Z
## 📋 Purpose
A Python project with 42 files.
## 🔑 Key Files (by Centrality)
| File | Score | Why It Matters |
|------|-------|----------------|
| `src/core/utils.py` | 0.1523 | 15 exports - 8 dependents |
| `src/api/routes.py` | 0.0891 | 6 exports - 5 dependents |
| `src/models/user.py` | 0.0654 | type definitions |
## ⚠️ Circular Dependencies
The following cycles were detected:
- auth.py → session.py → user.py → auth.py
## 🧭 Navigation
- **Source code?** Start with: [`./src/AGENTS.md`](./src/AGENTS.md)
- **Tests?** Start with: [`./tests/AGENTS.md`](./tests/AGENTS.md)| Language | Extensions | Import Parsing | Export Parsing |
|---|---|---|---|
| Python | .py, .pyi, .pyw | ✅ | ✅ |
| TypeScript | .ts, .tsx | ✅ | ✅ |
| JavaScript | .js, .jsx, .mjs, .cjs | ✅ | ✅ |
| Go | .go | 🚧 (coming soon) | 🚧 |
Files are ranked using the PageRank algorithm:
PR(f) = (1-d)/N + d × Σ PR(g)/L(g) for all g importing f
Where:
d= damping factor (0.85)N= total filesL(g)= number of files thatgimports
Intuition: A file is important if:
- Many files import it (direct importance)
- Important files import it (transitive importance)
Circular dependencies are detected using Tarjan's strongly connected components algorithm:
- O(V + E) complexity
- Finds all cycles, not just one
- Reports suggested break points (the edge from the most-imported file)
Files are assigned to layers for bottom-up understanding:
- Layer 0: Files with no imports (foundations)
- Layer N: Files that only import from layers 0..N-1
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run performance tests (opt-in)
BC_PERF=1 pytest -m perf
# Run tests with coverage
pytest --cov=src/better_context
# Type checking
mypy src/
# Linting
ruff check src/
# Format code
ruff format src/src/better_context/
├── cli.py # CLI entry point and command handlers
├── config.py # Configuration loader (.ctx.json)
├── ignore.py # .ctxignore pattern matching
├── scanner.py # File discovery and binary detection
├── manifest.py # Manifest JSON schema
├── graph.py # Dependency graph construction
├── centrality.py # PageRank and cycle detection
├── resolution.py # Import resolution
├── optimizer.py # Token budget optimizer
├── focus.py # Ego-centric context generation
├── semantic_anchor.py # Content-addressable chunk IDs
├── staleness.py # Manifest freshness detection
├── tree.py # Directory tree builder
├── visualize.py # Graph export (Mermaid, DOT, JSON)
├── errors.py # Error handling
├── chunker.py # Code chunking
├── cache.py # Incremental parse caching
├── callgraph.py # Function-level call graph analysis
├── coupling.py # Coupling metrics (Ca/Ce/I/A/D)
├── architecture.py # Architecture layer detection
├── orchestrator.py # High-level analysis coordination
├── primitives/ # Fast data primitives
│ ├── overview.py # Project metadata extraction
│ ├── tree.py # Directory structure
│ ├── scripts.py # Script extraction
│ ├── entries.py # Entry point detection
│ ├── file_info.py # Single file analysis
│ ├── deps.py # Dependency lookup
│ └── formatters.py # Output formatters (JSON, human, markdown)
└── languages/ # Language adapters
├── base.py # Adapter interface
├── python.py # Python adapter
├── typescript.py # TypeScript/JavaScript adapter
└── go.py # Go adapter
- Fast Primitives: Sub-200ms queries for project metadata, tree, scripts, entries, file info, and dependencies
- Bridge File Detection: Betweenness centrality identifies critical connector files
- Auto-Generated Architecture Diagrams: Mermaid diagrams from dependency graph
- Focus Mode: Ego-centric context centered on a specific file
- Token Budget Optimizer: Greedy and knapsack algorithms for budget-constrained selection
- Semantic Anchors: Content-addressable chunk IDs that survive refactoring
- Context Staleness Detection: Hash-based verification of manifest freshness
- Coupling Metrics: Ca/Ce/I/A/D metrics for architectural health analysis
- Architecture Layer Detection: Automatic classification into presentation/application/domain/infrastructure layers
- Call Graph Analysis: Function-level call tracking and hot path detection
- Incremental Caching: Hash-based parse cache for fast subsequent scans
- MCP Server Mode: Run as a Model Context Protocol server for IDE integration
Focus Mode generates ego-centric context centered on a specific file. Instead of analyzing the entire codebase, it radiates outward from a focal file to find the most relevant context.
- Bidirectional BFS: Explores both dependencies (what the file imports) and dependents (what imports the file)
- Distance-Weighted Scoring: Files are scored by
centrality × (decay ^ distance) - Categorization: Automatically identifies related tests, type definitions, and shared modules
# Generate focused context for a file
better-context focus src/auth/jwt.py
# Limit exploration depth (default: 3)
better-context focus src/auth/jwt.py --depth 2
# Adjust score decay (default: 0.8)
better-context focus src/auth/jwt.py --decay 0.5
# Output as JSON for programmatic use
better-context focus src/auth/jwt.py --json
# Save to file
better-context focus src/auth/jwt.py -o focus-context.mdFocus Mode generates a focused AGENTS.md containing:
- Summary: Neighborhood size, depth explored, dependency counts
- Direct Dependencies: Files the focal file imports (ranked by relevance)
- Direct Dependents: Files that import the focal file
- Extended Neighborhood: Files 2+ hops away
- Related Tests: Test files in the neighborhood
- Shared Types: Type definition files
- Suggested Reading Order: Optimal sequence for understanding the code
The Token Budget Optimizer selects the mathematically optimal subset of code chunks that fit within a token budget, maximizing value using constrained optimization.
- PageRank Weighting: Chunks are scored by their file's PageRank centrality
- Relevance Scoring: Optional keyword/task matching boosts relevant chunks
- Diversity Penalty: Penalizes selecting similar chunks to encourage variety
- Greedy/Knapsack Selection: Efficient algorithms for budget-constrained selection
Maximize: Σ(PageRank × relevance × diversity) / tokens_used
Subject to: tokens_used ≤ budget
# Select optimal context within 8000 token budget
better-context optimize --budget 8000
# Boost chunks matching specific keywords
better-context optimize -b 4000 -k auth user session
# Optimize for a specific task
better-context optimize -b 8000 --task "implement user authentication"
# Use knapsack algorithm for true optimality
better-context optimize -b 4000 -a knapsack
# Output as JSON
better-context optimize -b 8000 --json| Option | Description |
|---|---|
--budget, -b |
Token budget (default: 8000) |
--keywords, -k |
Keywords to boost relevance |
--task, -t |
Task description for relevance scoring |
--algorithm, -a |
greedy (default) or knapsack |
--diversity |
Diversity penalty factor 0-1 (default: 0.3) |
--json |
Output as JSON |
--output, -o |
Output file path |
The optimizer outputs a ranked list of chunks with:
- File path and chunk name
- Token count and efficiency score
- PageRank and relevance scores
- Budget utilization summary
Semantic Anchors provide content-addressable chunk IDs that survive refactoring. Instead of file:line-based references that break when code moves, semantic anchors are derived from hash(normalized_AST).
- AST Normalization: Code is normalized by removing comments, whitespace, and string contents
- Content Hashing: A SHA-256 hash of the normalized code produces a stable 16-character ID
- Anchor Mapping: The system tracks anchor → location mappings
- Move Detection: When code moves, the same anchor maps to the new location
- Durable Agent Memory: References to code remain valid across refactoring
- Stable Context Links: Links between context and code survive file reorganization
- Change Detection: Different anchors indicate semantic changes (not just whitespace)
# Same anchor regardless of location
def hello(name: str) -> str:
return f"Hello, {name}!"
# semantic_anchor: "a3f2e8c9b1d4a5f7"
# This stays the same even if the function moves to a different filefrom better_context import compute_semantic_anchor, AnchorMapping
# Compute anchor for a chunk
anchor = compute_semantic_anchor(
source=source_code,
start_line=10,
end_line=20,
language="python",
name="hello",
chunk_type="function",
)
# Track anchor locations
mapping = AnchorMapping()
update_anchor_mapping(mapping, anchor, "src/utils.py", 10, "src/utils.py:10:function:hello")
# Resolve anchor to current location
path, line = resolve_anchor(mapping, anchor)Better Context calculates Robert C. Martin's package coupling metrics to evaluate module stability and architectural health:
| Metric | Name | Description |
|---|---|---|
| Ca | Afferent Coupling | Number of modules that depend ON this module |
| Ce | Efferent Coupling | Number of modules this module depends ON |
| I | Instability | Ce / (Ca + Ce) — 0 = stable, 1 = unstable |
| A | Abstractness | Abstract definitions / total definitions |
| D | Distance | |A + I - 1| — 0 = ideal (on the main sequence) |
Modules are classified into architectural zones:
- Main Sequence (D ≈ 0): Healthy balance of stability and abstractness
- Zone of Pain (I ≈ 0, A ≈ 0): Stable but concrete — hard to extend
- Zone of Uselessness (I ≈ 1, A ≈ 1): Unstable and abstract — likely unused
from better_context import calculate_all_coupling_metrics, generate_zone_report
# Calculate metrics for all files
metrics = calculate_all_coupling_metrics(graph, file_entries)
# Generate zone classification report
report = generate_zone_report(metrics)
print(f"Files on main sequence: {len(report.on_main_sequence)}")
print(f"Files in zone of pain: {len(report.zone_of_pain)}")Better Context automatically classifies files into architectural layers using directory naming patterns, import direction analysis, and export type analysis:
| Layer | Description | Examples |
|---|---|---|
| Presentation | UI components, views, pages | components/, pages/, views/ |
| Application | Use cases, handlers, controllers | handlers/, controllers/, usecases/ |
| Domain | Business logic, models, entities | models/, domain/, entities/ |
| Infrastructure | Database, external APIs, adapters | db/, adapters/, repositories/ |
| Shared | Cross-cutting utilities, types | utils/, types/, helpers/ |
The tool detects when lower layers import from higher layers (e.g., infrastructure importing from presentation), which violates clean architecture principles.
from better_context import analyze_architecture
# Analyze architecture and detect violations
report = analyze_architecture(graph, file_entries)
for violation in report.violations:
print(f"{violation.source_path} ({violation.source_layer}) "
f"imports {violation.target_path} ({violation.target_layer})")Better Context builds function-level call graphs showing which functions call which other functions, enabling deeper code flow understanding beyond file-level imports.
- Call Site Extraction: Identifies function calls within function bodies
- Symbol Resolution: Resolves call targets to specific chunk IDs
- Forward/Reverse Indices: Quick lookup of callers and callees
- Hot Path Detection: Identifies frequently-called functions
- Impact Analysis: Determines what's affected by changing a function
from better_context import build_call_graph, get_callers, get_callees
# Build call graph from manifest
call_graph = build_call_graph(manifest)
# Find all functions that call a specific function
callers = get_callers(call_graph, "src/auth.py:validate_token")
# Find all functions called by a specific function
callees = get_callees(call_graph, "src/api/routes.py:handle_request")Run better-context scan first to index the codebase. The stats, graph, focus, optimize, verify, and deps commands require an existing manifest.
Check your .ctxignore patterns and ensure the directory contains supported file types (Python, TypeScript, JavaScript, Go).
This is informational — circular dependencies are reported but don't prevent analysis. Consider refactoring to break the cycle at the suggested point.
Increase max_file_size_kb in .ctx.json or add the file to .ctxignore.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Run tests (
pytest) - Submit a pull request
MIT License - see LICENSE for details.