Skip to content

Commit 46ef618

Browse files
script3rclaudehappy-otter
committed
Add comprehensive benchmark suite
- Add fixture_generator module for synthetic crypto code generation - Add component_bench for isolated component benchmarks (parsing, anchor hint, detection) - Add file_size_bench for varying file size tests - Add scale_bench for varying file count and crypto density tests - Add thread_scaling_bench for parallel scaling analysis - Add memory_bench for memory usage profiling (extended mode) - Add large_fixture_bench for 5K+ file tests (extended mode) - Rewrite BENCHMARK.md with quick start guide and comprehensive docs - Support normal mode (~5-8 min) and extended mode (~30 min) Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
1 parent 680d03c commit 46ef618

File tree

9 files changed

+1709
-58
lines changed

9 files changed

+1709
-58
lines changed

BENCHMARK.md

Lines changed: 109 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,134 @@
11
# Benchmarking Cipherscope
22

3-
This document explains how to read the micro-benchmark results and what they do (and do not) measure.
3+
This document explains how to run and interpret the benchmark suite.
44

5-
## What the micro-benchmark measures
5+
## Quick Start
66

7-
The benchmark is a full end-to-end scan using the compiled `cipherscope` binary. Each iteration:
8-
- Walks the roots and discovers files (respecting ignore rules).
9-
- Runs a fast regex anchor hint to skip files with no matching library/API patterns.
10-
- Parses files into ASTs.
11-
- Finds library anchors and algorithm hits.
12-
- Writes JSONL output to a temp file.
7+
```bash
8+
# Run all benchmarks (fast mode, ~5-8 minutes)
9+
cargo bench
1310

14-
This is an integrated measurement of scanner performance, not a unit benchmark of a single stage.
11+
# Run extended benchmarks (~30 minutes)
12+
CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench
1513

16-
## Datasets used
17-
18-
The current benchmark runs two small fixed datasets:
19-
- `fixtures`: `fixtures/` only (26 files).
20-
- `repo_mix`: `fixtures/` + `src/` + `tests/` (30 files).
21-
22-
These datasets are intentionally small and fast to run. They are useful for regression tracking but not
23-
representative of large codebases.
14+
# Run a specific benchmark
15+
cargo bench --bench component_bench
16+
cargo bench --bench scale_bench
17+
```
2418

25-
## Threading variants
19+
## Benchmark Modes
20+
21+
### Normal Mode (Default)
22+
Runs essential benchmarks with minimal variants. Completes in **~5-8 minutes**.
23+
24+
### Extended Mode
25+
Set `CIPHERSCOPE_BENCH_EXTENDED=1` to enable:
26+
- More file size variants (1KB-1MB)
27+
- More file count variants (100-10K)
28+
- More thread counts (1,2,4,8,16,32)
29+
- Memory profiling benchmarks
30+
- Large fixture benchmarks (5K+ files)
31+
32+
## Benchmark Summary
33+
34+
| Benchmark | Normal Mode | Extended Mode |
35+
|-----------|-------------|---------------|
36+
| `scan_bench` | 4 variants (~1 min) | Same |
37+
| `component_bench` | 8 variants (~1.5 min) | 15 variants (~3 min) |
38+
| `file_size_bench` | 3 sizes (~1 min) | 5 sizes (~2 min) |
39+
| `scale_bench` | 3 file counts (~1.5 min) | 6 counts + density (~5 min) |
40+
| `thread_scaling_bench` | 3 thread counts (~1 min) | 7 thread counts (~3 min) |
41+
| `memory_bench` | Skipped | 3 variants (~3 min) |
42+
| `large_fixture_bench` | Skipped | 5K files + nested (~5 min) |
43+
44+
## Benchmark Details
45+
46+
### scan_bench
47+
Basic end-to-end scan benchmark using the existing fixtures.
48+
```bash
49+
cargo bench --bench scan_bench
50+
```
2651

27-
Each dataset is benchmarked with:
28-
- `1` thread.
29-
- `num_cpus::get()` threads (full CPU on the current machine).
52+
### component_bench
53+
Isolates individual scanner components:
54+
- `parsing` - Tree-sitter AST parsing
55+
- `anchor_hint` - Fast regex pre-filter
56+
- `library_anchors` - Library detection
57+
- `algorithm_detection` - Pattern matching
58+
- `full_pipeline` - Complete scan pipeline
59+
- `language_detection` - File extension mapping
60+
- `pattern_loading` - PatternSet initialization
61+
62+
```bash
63+
cargo bench --bench component_bench
64+
```
3065

31-
This shows scaling behavior on the same workload.
66+
### file_size_bench
67+
Tests performance with different file sizes (1KB, 10KB, 100KB, etc.).
68+
```bash
69+
cargo bench --bench file_size_bench
70+
```
3271

33-
## Interpreting numbers
72+
### scale_bench
73+
Tests performance with different file counts (100, 500, 1000, etc.).
74+
```bash
75+
cargo bench --bench scale_bench
76+
```
3477

35-
Criterion reports a time range per benchmark, e.g.:
78+
### thread_scaling_bench
79+
Measures parallel scaling efficiency.
80+
```bash
81+
cargo bench --bench thread_scaling_bench
3682
```
37-
scan/fixtures/1 time: [209.72 ms 210.81 ms 211.61 ms]
83+
84+
### memory_bench (Extended Only)
85+
Profiles memory usage during scans.
86+
```bash
87+
CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench --bench memory_bench
3888
```
3989

40-
This range represents the typical runtime distribution (low/median/high) across samples.
41-
For quick intuition, you can estimate throughput:
42-
- `files/sec ≈ file_count / median_time_seconds`
90+
### large_fixture_bench (Extended Only)
91+
Tests with large synthetic fixtures (5K+ files).
92+
```bash
93+
CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench --bench large_fixture_bench
94+
```
4395

44-
Example:
45-
- 26 files / 0.210 s ≈ 124 files/sec.
96+
## Interpreting Results
4697

47-
## Methodology summary
98+
Criterion reports time as a range:
99+
```
100+
parsing/lang/python time: [1.98 ms 2.00 ms 2.01 ms]
101+
thrpt: [4.76 MiB/s 4.79 MiB/s 4.83 MiB/s]
102+
```
48103

49-
The benchmark:
50-
- Uses `cargo bench --bench scan_bench`.
51-
- Warms up for ~3 seconds.
52-
- Collects 10 samples over ~10 seconds per case.
53-
- Shells out to the compiled binary and writes JSONL to a temp file.
104+
- First line: timing (low / median / high)
105+
- Second line: throughput (if configured)
54106

55-
This keeps the timing focused on real scanning work while avoiding stdout overhead.
107+
### Throughput Metrics
108+
- `Elements/s`: Files scanned per second
109+
- `MiB/s`: Data processed per second
56110

57-
## Large-scale benchmark
111+
## Environment Variables
58112

59-
For a more realistic scan, the `scan_large_bench` benchmark targets a folder
60-
containing multiple large repositories. It is opt-in and can be run with:
61-
```
62-
CIPHERSCOPE_BENCH_FIXTURE=/path/to/fixture cargo bench --bench scan_large_bench
63-
```
113+
| Variable | Description |
114+
|----------|-------------|
115+
| `CIPHERSCOPE_BENCH_EXTENDED` | Enable extended benchmarks |
116+
| `CIPHERSCOPE_BENCH_FIXTURE` | Custom fixture path for scan_large_bench |
117+
| `CIPHERSCOPE_BENCH_THREADS` | Custom thread counts (comma-separated) |
64118

65-
If `CIPHERSCOPE_BENCH_FIXTURE` is not set, the benchmark defaults to
66-
`../cipherscope-paper/fixture` relative to the `cipherscope` repo. The large
67-
benchmark uses fewer samples and a longer measurement window to accommodate
68-
large repos.
119+
## Tips
69120

70-
## Limitations and caveats
121+
1. Close other applications to reduce noise
122+
2. Run multiple times to verify consistency
123+
3. Results are saved to `target/criterion/`
124+
4. HTML reports: `target/criterion/<name>/report/index.html`
71125

72-
- Results are machine- and filesystem-dependent.
73-
- Small datasets can exaggerate overhead and reduce signal.
74-
- OS caching can make repeated scans faster than cold-cache runs.
75-
- The output writing cost is included (to a temp file).
126+
## Comparing Baselines
76127

77-
## When to extend the benchmark
128+
```bash
129+
# Save baseline
130+
cargo bench -- --save-baseline before
78131

79-
For larger or more realistic measurements, consider:
80-
- Adding a larger repo checkout as an additional dataset.
81-
- Reporting total bytes scanned to compute MB/sec.
82-
- Running explicit cold-cache tests.
83-
- Adding a "no-output" mode for pure scanning cost.
132+
# Make changes, then compare
133+
cargo bench -- --baseline before
134+
```

Cargo.toml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,27 @@ harness = false
6161
[[bench]]
6262
name = "scan_large_bench"
6363
harness = false
64+
65+
[[bench]]
66+
name = "file_size_bench"
67+
harness = false
68+
69+
[[bench]]
70+
name = "scale_bench"
71+
harness = false
72+
73+
[[bench]]
74+
name = "thread_scaling_bench"
75+
harness = false
76+
77+
[[bench]]
78+
name = "component_bench"
79+
harness = false
80+
81+
[[bench]]
82+
name = "memory_bench"
83+
harness = false
84+
85+
[[bench]]
86+
name = "large_fixture_bench"
87+
harness = false

0 commit comments

Comments
 (0)