|
1 | 1 | # Benchmarking Cipherscope |
2 | 2 |
|
3 | | -This document explains how to read the micro-benchmark results and what they do (and do not) measure. |
| 3 | +This document explains how to run and interpret the benchmark suite. |
4 | 4 |
|
5 | | -## What the micro-benchmark measures |
| 5 | +## Quick Start |
6 | 6 |
|
7 | | -The benchmark is a full end-to-end scan using the compiled `cipherscope` binary. Each iteration: |
8 | | -- Walks the roots and discovers files (respecting ignore rules). |
9 | | -- Runs a fast regex anchor hint to skip files with no matching library/API patterns. |
10 | | -- Parses files into ASTs. |
11 | | -- Finds library anchors and algorithm hits. |
12 | | -- Writes JSONL output to a temp file. |
| 7 | +```bash |
| 8 | +# Run all benchmarks (fast mode, ~5-8 minutes) |
| 9 | +cargo bench |
13 | 10 |
|
14 | | -This is an integrated measurement of scanner performance, not a unit benchmark of a single stage. |
| 11 | +# Run extended benchmarks (~30 minutes) |
| 12 | +CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench |
15 | 13 |
|
16 | | -## Datasets used |
17 | | - |
18 | | -The current benchmark runs two small fixed datasets: |
19 | | -- `fixtures`: `fixtures/` only (26 files). |
20 | | -- `repo_mix`: `fixtures/` + `src/` + `tests/` (30 files). |
21 | | - |
22 | | -These datasets are intentionally small and fast to run. They are useful for regression tracking but not |
23 | | -representative of large codebases. |
| 14 | +# Run a specific benchmark |
| 15 | +cargo bench --bench component_bench |
| 16 | +cargo bench --bench scale_bench |
| 17 | +``` |
24 | 18 |
|
25 | | -## Threading variants |
| 19 | +## Benchmark Modes |
| 20 | + |
| 21 | +### Normal Mode (Default) |
| 22 | +Runs essential benchmarks with minimal variants. Completes in **~5-8 minutes**. |
| 23 | + |
| 24 | +### Extended Mode |
| 25 | +Set `CIPHERSCOPE_BENCH_EXTENDED=1` to enable: |
| 26 | +- More file size variants (1KB-1MB) |
| 27 | +- More file count variants (100-10K) |
| 28 | +- More thread counts (1,2,4,8,16,32) |
| 29 | +- Memory profiling benchmarks |
| 30 | +- Large fixture benchmarks (5K+ files) |
| 31 | + |
| 32 | +## Benchmark Summary |
| 33 | + |
| 34 | +| Benchmark | Normal Mode | Extended Mode | |
| 35 | +|-----------|-------------|---------------| |
| 36 | +| `scan_bench` | 4 variants (~1 min) | Same | |
| 37 | +| `component_bench` | 8 variants (~1.5 min) | 15 variants (~3 min) | |
| 38 | +| `file_size_bench` | 3 sizes (~1 min) | 5 sizes (~2 min) | |
| 39 | +| `scale_bench` | 3 file counts (~1.5 min) | 6 counts + density (~5 min) | |
| 40 | +| `thread_scaling_bench` | 3 thread counts (~1 min) | 7 thread counts (~3 min) | |
| 41 | +| `memory_bench` | Skipped | 3 variants (~3 min) | |
| 42 | +| `large_fixture_bench` | Skipped | 5K files + nested (~5 min) | |
| 43 | + |
| 44 | +## Benchmark Details |
| 45 | + |
| 46 | +### scan_bench |
| 47 | +Basic end-to-end scan benchmark using the existing fixtures. |
| 48 | +```bash |
| 49 | +cargo bench --bench scan_bench |
| 50 | +``` |
26 | 51 |
|
27 | | -Each dataset is benchmarked with: |
28 | | -- `1` thread. |
29 | | -- `num_cpus::get()` threads (full CPU on the current machine). |
| 52 | +### component_bench |
| 53 | +Isolates individual scanner components: |
| 54 | +- `parsing` - Tree-sitter AST parsing |
| 55 | +- `anchor_hint` - Fast regex pre-filter |
| 56 | +- `library_anchors` - Library detection |
| 57 | +- `algorithm_detection` - Pattern matching |
| 58 | +- `full_pipeline` - Complete scan pipeline |
| 59 | +- `language_detection` - File extension mapping |
| 60 | +- `pattern_loading` - PatternSet initialization |
| 61 | + |
| 62 | +```bash |
| 63 | +cargo bench --bench component_bench |
| 64 | +``` |
30 | 65 |
|
31 | | -This shows scaling behavior on the same workload. |
| 66 | +### file_size_bench |
| 67 | +Tests performance with different file sizes (1KB, 10KB, 100KB, etc.). |
| 68 | +```bash |
| 69 | +cargo bench --bench file_size_bench |
| 70 | +``` |
32 | 71 |
|
33 | | -## Interpreting numbers |
| 72 | +### scale_bench |
| 73 | +Tests performance with different file counts (100, 500, 1000, etc.). |
| 74 | +```bash |
| 75 | +cargo bench --bench scale_bench |
| 76 | +``` |
34 | 77 |
|
35 | | -Criterion reports a time range per benchmark, e.g.: |
| 78 | +### thread_scaling_bench |
| 79 | +Measures parallel scaling efficiency. |
| 80 | +```bash |
| 81 | +cargo bench --bench thread_scaling_bench |
36 | 82 | ``` |
37 | | -scan/fixtures/1 time: [209.72 ms 210.81 ms 211.61 ms] |
| 83 | + |
| 84 | +### memory_bench (Extended Only) |
| 85 | +Profiles memory usage during scans. |
| 86 | +```bash |
| 87 | +CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench --bench memory_bench |
38 | 88 | ``` |
39 | 89 |
|
40 | | -This range represents the typical runtime distribution (low/median/high) across samples. |
41 | | -For quick intuition, you can estimate throughput: |
42 | | -- `files/sec ≈ file_count / median_time_seconds` |
| 90 | +### large_fixture_bench (Extended Only) |
| 91 | +Tests with large synthetic fixtures (5K+ files). |
| 92 | +```bash |
| 93 | +CIPHERSCOPE_BENCH_EXTENDED=1 cargo bench --bench large_fixture_bench |
| 94 | +``` |
43 | 95 |
|
44 | | -Example: |
45 | | -- 26 files / 0.210 s ≈ 124 files/sec. |
| 96 | +## Interpreting Results |
46 | 97 |
|
47 | | -## Methodology summary |
| 98 | +Criterion reports time as a range: |
| 99 | +``` |
| 100 | +parsing/lang/python time: [1.98 ms 2.00 ms 2.01 ms] |
| 101 | + thrpt: [4.76 MiB/s 4.79 MiB/s 4.83 MiB/s] |
| 102 | +``` |
48 | 103 |
|
49 | | -The benchmark: |
50 | | -- Uses `cargo bench --bench scan_bench`. |
51 | | -- Warms up for ~3 seconds. |
52 | | -- Collects 10 samples over ~10 seconds per case. |
53 | | -- Shells out to the compiled binary and writes JSONL to a temp file. |
| 104 | +- First line: timing (low / median / high) |
| 105 | +- Second line: throughput (if configured) |
54 | 106 |
|
55 | | -This keeps the timing focused on real scanning work while avoiding stdout overhead. |
| 107 | +### Throughput Metrics |
| 108 | +- `Elements/s`: Files scanned per second |
| 109 | +- `MiB/s`: Data processed per second |
56 | 110 |
|
57 | | -## Large-scale benchmark |
| 111 | +## Environment Variables |
58 | 112 |
|
59 | | -For a more realistic scan, the `scan_large_bench` benchmark targets a folder |
60 | | -containing multiple large repositories. It is opt-in and can be run with: |
61 | | -``` |
62 | | -CIPHERSCOPE_BENCH_FIXTURE=/path/to/fixture cargo bench --bench scan_large_bench |
63 | | -``` |
| 113 | +| Variable | Description | |
| 114 | +|----------|-------------| |
| 115 | +| `CIPHERSCOPE_BENCH_EXTENDED` | Enable extended benchmarks | |
| 116 | +| `CIPHERSCOPE_BENCH_FIXTURE` | Custom fixture path for scan_large_bench | |
| 117 | +| `CIPHERSCOPE_BENCH_THREADS` | Custom thread counts (comma-separated) | |
64 | 118 |
|
65 | | -If `CIPHERSCOPE_BENCH_FIXTURE` is not set, the benchmark defaults to |
66 | | -`../cipherscope-paper/fixture` relative to the `cipherscope` repo. The large |
67 | | -benchmark uses fewer samples and a longer measurement window to accommodate |
68 | | -large repos. |
| 119 | +## Tips |
69 | 120 |
|
70 | | -## Limitations and caveats |
| 121 | +1. Close other applications to reduce noise |
| 122 | +2. Run multiple times to verify consistency |
| 123 | +3. Results are saved to `target/criterion/` |
| 124 | +4. HTML reports: `target/criterion/<name>/report/index.html` |
71 | 125 |
|
72 | | -- Results are machine- and filesystem-dependent. |
73 | | -- Small datasets can exaggerate overhead and reduce signal. |
74 | | -- OS caching can make repeated scans faster than cold-cache runs. |
75 | | -- The output writing cost is included (to a temp file). |
| 126 | +## Comparing Baselines |
76 | 127 |
|
77 | | -## When to extend the benchmark |
| 128 | +```bash |
| 129 | +# Save baseline |
| 130 | +cargo bench -- --save-baseline before |
78 | 131 |
|
79 | | -For larger or more realistic measurements, consider: |
80 | | -- Adding a larger repo checkout as an additional dataset. |
81 | | -- Reporting total bytes scanned to compute MB/sec. |
82 | | -- Running explicit cold-cache tests. |
83 | | -- Adding a "no-output" mode for pure scanning cost. |
| 132 | +# Make changes, then compare |
| 133 | +cargo bench -- --baseline before |
| 134 | +``` |
0 commit comments