Comprehensive guide for deploying Thread Flow in CLI/local environments with PostgreSQL backend and parallel processing.
- Prerequisites
- Local Development Setup
- PostgreSQL Backend Configuration
- Parallel Processing Setup
- Production CLI Deployment
- Environment Variables
- Verification
- Next Steps
- Operating System: Linux, macOS, or Windows with WSL2
- Rust: 1.75.0 or later (edition 2024)
- CPU: Multi-core recommended for parallel processing (2+ cores)
- Memory: 4GB minimum, 8GB+ recommended for large codebases
- Disk: 500MB+ for Thread binaries and dependencies
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# PostgreSQL 14+ (for persistent caching)
# Ubuntu/Debian
sudo apt install postgresql postgresql-contrib
# macOS
brew install postgresql@14
# Verify installations
rustc --version # Should be 1.75.0+
psql --version # Should be 14+- mise: Development environment manager (
curl https://mise.run | sh) - cargo-nextest: Fast test runner (
cargo install cargo-nextest) - cargo-watch: Auto-rebuild on changes (
cargo install cargo-watch)
# Clone repository
git clone https://github.com/your-org/thread.git
cd thread
# Install development tools (if using mise)
mise run install-tools
# Build with all features (CLI configuration)
cargo build --workspace --all-features --release
# Verify build
./target/release/thread --versionThread Flow CLI builds use these default features:
# Cargo.toml - CLI configuration
[features]
default = ["recoco-minimal", "parallel"]
# PostgreSQL backend support
recoco-postgres = ["recoco-minimal", "recoco/target-postgres"]
# Parallel processing (Rayon)
parallel = ["dep:rayon"]
# Query result caching (optional but recommended)
caching = ["dep:moka"]Recommended CLI Build:
# Full-featured CLI with PostgreSQL, parallelism, and caching
cargo build --release --features "recoco-postgres,parallel,caching"thread/
├── crates/flow/ # Thread Flow library
├── target/release/ # Compiled binaries
│ └── thread # Main CLI binary
├── data/ # Analysis results (create this)
└── .env # Environment configuration (create this)
Create required directories:
mkdir -p data
touch .env# Start PostgreSQL service
# Linux
sudo systemctl start postgresql
sudo systemctl enable postgresql
# macOS
brew services start postgresql@14
# Create database and user
sudo -u postgres psql
# Inside psql:
CREATE DATABASE thread_cache;
CREATE USER thread_user WITH ENCRYPTED PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE thread_cache TO thread_user;
\qThread Flow uses ReCoco's PostgreSQL target which auto-creates tables. The schema includes:
-- Content-addressed cache table (auto-created by ReCoco)
CREATE TABLE IF NOT EXISTS code_symbols (
content_hash TEXT PRIMARY KEY, -- Blake3 fingerprint
file_path TEXT NOT NULL,
language TEXT,
symbols JSONB, -- Extracted symbol data
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Indexes for fast lookups
CREATE INDEX idx_symbols_file_path ON code_symbols(file_path);
CREATE INDEX idx_symbols_language ON code_symbols(language);
CREATE INDEX idx_symbols_created ON code_symbols(created_at);No manual schema creation needed—ReCoco handles this automatically.
Create .env in project root:
# .env - PostgreSQL connection
DATABASE_URL=postgresql://thread_user:your_secure_password@localhost:5432/thread_cache
# Optional: Connection pool settings
DB_POOL_SIZE=10
DB_CONNECTION_TIMEOUT=30# Test connection
psql -U thread_user -d thread_cache -h localhost
# Inside psql - verify tables exist after first run
\dt
\d code_symbols
\qThread Flow uses Rayon for CPU-bound parallel processing in CLI environments.
Default Behavior (automatic):
- Rayon detects available CPU cores
- Spawns worker threads = num_cores
- Distributes file processing across threads
Manual Thread Control (optional):
# Set RAYON_NUM_THREADS environment variable
export RAYON_NUM_THREADS=4 # Use 4 cores
# Or in .env file
echo "RAYON_NUM_THREADS=4" >> .env| CPU Cores | 100 Files | 1000 Files | 10,000 Files |
|---|---|---|---|
| 1 core | ~1.6s | ~16s | ~160s |
| 2 cores | ~0.8s | ~8s | ~80s |
| 4 cores | ~0.4s | ~4s | ~40s |
| 8 cores | ~0.2s | ~2s | ~20s |
Speedup: Linear with core count (2-8x typical)
Recommended Settings:
# CPU-bound workloads (parsing, AST analysis)
# Use all physical cores
RAYON_NUM_THREADS=$(nproc) # Linux
RAYON_NUM_THREADS=$(sysctl -n hw.ncpu) # macOS
# I/O-bound workloads (file reading, database queries)
# Use 2x physical cores
RAYON_NUM_THREADS=$(($(nproc) * 2)) # Linux
# Mixed workloads (default)
# Let Rayon auto-detect
unset RAYON_NUM_THREADS# Check feature is enabled
cargo tree --features | grep rayon
# Expected output:
# └── rayon v1.10.0
# Run with parallel logging
RUST_LOG=thread_flow=debug cargo run --release --features parallel
# Look for log entries indicating parallel execution:
# [DEBUG thread_flow::batch] Processing 100 files across 4 threads# Release build with full optimizations
cargo build \
--release \
--features "recoco-postgres,parallel,caching" \
--workspace
# Binary location
ls -lh target/release/thread
# Should be ~15-25MB
# Optional: Strip debug symbols for smaller binary
strip target/release/thread
# Should reduce to ~10-15MB# Copy binary to system path
sudo cp target/release/thread /usr/local/bin/
# Verify installation
thread --version
thread --helpCreate production config file:
# /etc/thread/config.env
DATABASE_URL=postgresql://thread_user:strong_password@db.production.com:5432/thread_cache
RAYON_NUM_THREADS=8
RUST_LOG=thread_flow=info
# Cache configuration
THREAD_CACHE_MAX_CAPACITY=100000 # 100k entries
THREAD_CACHE_TTL_SECONDS=3600 # 1 hour
# Performance tuning
THREAD_BATCH_SIZE=100 # Files per batch# /etc/systemd/system/thread-analyzer.service
[Unit]
Description=Thread Code Analyzer Service
After=network.target postgresql.service
[Service]
Type=simple
User=thread
Group=thread
EnvironmentFile=/etc/thread/config.env
ExecStart=/usr/local/bin/thread analyze --watch /var/projects
Restart=on-failure
RestartSec=10
# Resource limits
MemoryLimit=4G
CPUQuota=400% # 4 cores max
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl daemon-reload
sudo systemctl enable thread-analyzer
sudo systemctl start thread-analyzer
sudo systemctl status thread-analyzer# Dockerfile - Production CLI
FROM rust:1.75-slim as builder
WORKDIR /build
COPY . .
# Build with production features
RUN cargo build --release \
--features "recoco-postgres,parallel,caching" \
--workspace
FROM debian:bookworm-slim
# Install PostgreSQL client libraries
RUN apt-get update && apt-get install -y \
libpq5 \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /build/target/release/thread /usr/local/bin/
# Create non-root user
RUN useradd -m -u 1001 thread
USER thread
ENTRYPOINT ["thread"]Build and run:
# Build image
docker build -t thread-cli:latest .
# Run with PostgreSQL connection
docker run --rm \
-e DATABASE_URL=postgresql://thread_user:pass@host.docker.internal:5432/thread_cache \
-e RAYON_NUM_THREADS=4 \
-v $(pwd)/data:/data \
thread-cli:latest analyze /data| Variable | Purpose | Default | Example |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string | None (required) | postgresql://user:pass@localhost/thread |
RAYON_NUM_THREADS |
Parallel processing thread count | Auto-detect | 4 |
RUST_LOG |
Logging level | info |
thread_flow=debug |
| Variable | Purpose | Default | Example |
|---|---|---|---|
THREAD_CACHE_MAX_CAPACITY |
Max cache entries | 10000 |
100000 |
THREAD_CACHE_TTL_SECONDS |
Cache entry lifetime | 300 (5 min) |
3600 (1 hour) |
| Variable | Purpose | Default | Example |
|---|---|---|---|
THREAD_BATCH_SIZE |
Files per batch | 100 |
500 |
DB_POOL_SIZE |
PostgreSQL connection pool size | 10 |
20 |
DB_CONNECTION_TIMEOUT |
Database connection timeout (sec) | 30 |
60 |
# Production CLI Configuration
# PostgreSQL backend
DATABASE_URL=postgresql://thread_user:secure_password@localhost:5432/thread_cache
DB_POOL_SIZE=20
DB_CONNECTION_TIMEOUT=60
# Parallel processing
RAYON_NUM_THREADS=8
# Caching (100k entries, 1 hour TTL)
THREAD_CACHE_MAX_CAPACITY=100000
THREAD_CACHE_TTL_SECONDS=3600
# Performance
THREAD_BATCH_SIZE=500
# Logging
RUST_LOG=thread_flow=info,thread_services=info# Verify binary works
thread --version
# Expected: thread 0.1.0
# Verify PostgreSQL connection
thread db-check
# Expected: ✅ PostgreSQL connection successful
# Verify parallel processing
thread system-info
# Expected:
# CPU Cores: 8
# Rayon Threads: 8
# Parallel Processing: Enabled
# Verify cache configuration
thread cache-stats
# Expected:
# Cache Capacity: 100,000 entries
# Cache TTL: 3600 seconds
# Current Entries: 0# Analyze small test project
thread analyze ./test-project
# Expected output:
# Analyzing 10 files across 4 threads...
# Blake3 fingerprinting: 10 files in 4.25µs
# Cache hits: 0 (0.0%)
# Parsing: 10 files in 1.47ms
# Extracting symbols: 150 symbols found
# PostgreSQL export: 10 records inserted
# Total time: 15.2ms
# Second run (cache hit)
thread analyze ./test-project
# Expected output:
# Analyzing 10 files across 4 threads...
# Blake3 fingerprinting: 10 files in 4.25µs
# Cache hits: 10 (100.0%) ← All files cached!
# Total time: 0.5ms ← 30x faster!# Query cached data
psql -U thread_user -d thread_cache -c "
SELECT
content_hash,
file_path,
language,
jsonb_array_length(symbols) as symbol_count
FROM code_symbols
LIMIT 5;
"
# Expected output:
# content_hash | file_path | language | symbol_count
# --------------------+--------------------+----------+--------------
# abc123... | src/main.rs | rust | 15
# def456... | src/lib.rs | rust | 42# Run official benchmarks
cargo bench --features "parallel,caching"
# Expected results:
# fingerprint_benchmark 425 ns per file (Blake3)
# parse_benchmark 147 µs per file (tree-sitter)
# cache_hit_benchmark <1 µs (memory lookup)
# Speedup:
# Fingerprint vs Parse: 346x faster
# Cache vs Parse: 147,000x faster- Set up monitoring → See
docs/operations/PERFORMANCE_TUNING.md - Configure alerts → Database connection failures, cache misses >10%
- Enable backup → PostgreSQL regular backups for cache data
- Load testing → Test with production-scale codebases
-
Install cargo-watch → Auto-rebuild on code changes
cargo install cargo-watch cargo watch -x "run --features parallel" -
Enable debug logging → Detailed execution traces
RUST_LOG=thread_flow=trace cargo run
-
Profile performance → Identify bottlenecks
cargo build --release --features parallel perf record ./target/release/thread analyze large-project/ perf report
- Edge Deployment:
crates/cloudflare/docs/EDGE_DEPLOYMENT.md(segregated - see crates/cloudflare/) - Performance Tuning:
docs/operations/PERFORMANCE_TUNING.md - Troubleshooting:
docs/operations/TROUBLESHOOTING.md - Architecture Overview:
docs/architecture/THREAD_FLOW_ARCHITECTURE.md
Before deploying Thread Flow CLI to production:
- PostgreSQL 14+ installed and configured
- Database user and permissions created
- Environment variables configured in
.envor systemd service - Binary built with
--release --features "recoco-postgres,parallel,caching" - Health checks passing (
thread --version,thread db-check) - Test analysis run successful with cache hits on second run
- Logging configured (
RUST_LOGappropriate for environment) - Resource limits set (systemd
MemoryLimit,CPUQuota) - Backup strategy for PostgreSQL cache data
- Monitoring and alerting configured
Deployment Target: CLI/Local environments with PostgreSQL backend Concurrency Model: Rayon (multi-threaded parallelism) Storage Backend: PostgreSQL (persistent caching) Performance: 2-8x speedup on multi-core, 99.7% cache cost reduction