本指南說明如何使用 comprehensive_bench.py 完成專案要求的所有評估指標。
| 評估項目 | 支援狀態 | 工具 |
|---|---|---|
| 資料集 | ||
| SIFT1M (128D) | ✅ | comprehensive_bench.py |
| GIST1M (960D) | ✅ | comprehensive_bench.py |
| 準確率 | ||
| Recall@1 | ✅ | comprehensive_bench.py |
| Recall@10 | ✅ | comprehensive_bench.py |
| Recall@100 | ✅ | comprehensive_bench.py |
| 性能 | ||
| QPS | ✅ | comprehensive_bench.py |
| p50 latency | ✅ | comprehensive_bench.py |
| p95 latency | ✅ | comprehensive_bench.py |
| 索引成本 | ||
| Index build time | ✅ | comprehensive_bench.py |
| bytes/vector | ✅ | comprehensive_bench.py |
| 視覺化 | ||
| Recall-QPS curve | ✅ | plot_tradeoff.py |
# 創建數據目錄
mkdir -p data
# 下載 SIFT1M
cd data
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
tar -xzvf sift.tar.gz
# 下載 GIST1M
wget ftp://ftp.irisa.fr/local/texmex/corpus/gist.tar.gz
tar -xzvf gist.tar.gz
cd ..pip install psutil matplotlib numpy# 設定環境變數
export LD_LIBRARY_PATH=extern/faiss/build/install/lib:$LD_LIBRARY_PATH
# SIFT1M 測試
python3 benchmark/comprehensive_bench.py \
--base data/sift/sift_base.fvecs \
--query data/sift/sift_query.fvecs \
--groundtruth data/sift/sift_groundtruth.ivecs \
--nlist 1024 \
--nprobe-list "1,2,4,8,16,32,64,128,256" \
--k-list "1,10,100" \
--index-file sift_index.bin \
--output-dir benchmark_results
# GIST1M 測試(可選,要注意會花相當多時間)
python3 benchmark/comprehensive_bench.py \
--base data/gist/gist_base.fvecs \
--query data/gist/gist_query.fvecs \
--groundtruth data/gist/gist_groundtruth.ivecs \
--nlist 1024 \
--nprobe-list "1,4,16,64,256,512" \
--k-list "1,10,100" \
--index-file gist_index.bin \
--output-dir benchmark_results# 生成 Recall-QPS 曲線
python3 benchmark/plot_tradeoff.py benchmark_results/*.json輸出文件:
recall_qps_tradeoff.png- Recall vs QPS 曲線(3 個子圖,對應 k=1,10,100)latency_distribution.png- 延遲分析圖benchmark_report.txt- 文字報告
======================================================================
Testing: nlist=1024, nprobe=16
======================================================================
Measuring batch QPS (k=100)...
QPS (batch): 2450.32
Latency - Mean: 0.408 ms
Latency - p50: 0.385 ms
Latency - p95: 0.612 ms
Latency - p99: 0.758 ms
Computing Recall@k...
Recall@1: 84.52%
Recall@10: 95.28%
Recall@100: 99.15%
======================================================================
SUMMARY: Recall-QPS Trade-off
======================================================================
nprobe QPS p50(ms) p95(ms) R@1 R@10 R@100
----------------------------------------------------------------------
1 12450.3 0.080 0.125 32.15 42.58 58.23
2 8920.5 0.112 0.185 52.34 65.87 78.45
4 5630.2 0.178 0.295 68.92 82.15 89.67
8 3580.1 0.279 0.448 79.45 91.23 95.82
16 2450.3 0.408 0.612 84.52 95.28 98.15
32 1680.5 0.595 0.891 87.89 97.45 99.32
✅ Target achieved: Recall@10 = 95.28% >= 95%
Best config: nprobe=16, QPS=2450.3
{
"metadata": {
"dataset": "sift",
"n_base": 1000000,
"n_queries": 10000,
"dimension": 128,
"nlist": 1024,
"nprobe_list": [1, 2, 4, 8, 16, 32],
"k_values": [1, 10, 100],
"build_time_sec": 45.234,
"bytes_per_vector": 8.5,
"timestamp": "20251106_150000"
},
"results": [
{
"nlist": 1024,
"nprobe": 16,
"qps_batch": 2450.32,
"latency_mean_ms": 0.408,
"latency_p50_ms": 0.385,
"latency_p95_ms": 0.612,
"latency_p99_ms": 0.758,
"recall@1": 0.8452,
"recall@10": 0.9528,
"recall@100": 0.9915,
"memory_mb": 128.5,
"build_time_sec": 45.234,
"bytes_per_vector": 8.5
}
]
}export LD_LIBRARY_PATH=extern/faiss/build/install/lib:$LD_LIBRARY_PATH
# 1. 測試 OpenMP
## "Testing OpenMP version..."
git checkout feature/openMP
make clean && make
python3 benchmark/comprehensive_bench.py \
--base data/sift/sift_base.fvecs \
--query data/sift/sift_query.fvecs \
--groundtruth data/sift/sift_groundtruth.ivecs \
--nlist 1024 \
--nprobe-list "1,4,8,16,32,64" \
--k-list "1,10,100" \
--index-file sift_openmp.bin \
--output-dir results_openmp
# 2. 生成對比圖表 (但注意要指定正確的 json,或是把之前的 json 清理)
python3 benchmark/plot_tradeoff.py \
results_baseline/sift*.json \
results_openmp/sift*.json
## "Done! Check recall_qps_tradeoff.png and benchmark_report.txt"根據文獻,對於 SIFT1M:
| nlist | nprobe | 預期 Recall@10 | 預期 QPS (baseline) |
|---|---|---|---|
| 1024 | 16 | ~95% | ~2000 |
| 1024 | 32 | ~97% | ~1200 |
| 2048 | 32 | ~96% | ~1500 |
| 指標 | Baseline | OpenMP (8核心) | 加速比 |
|---|---|---|---|
| QPS | 2000 | 8000-12000 | 4-6x |
| p95 latency | 0.5 ms | 0.15 ms | 3-4x |
| Build time | 45 s | 45 s | 1x (未優化) |
症狀:即使 nprobe=256 也達不到 95%
解決:
# 增加 nlist
python3 comprehensive_bench.py ... --nlist 2048症狀:OpenMP 版本 QPS 與 baseline 相同
檢查:
# 確認 OpenMP 編譯標誌
cat Makefile | grep fopenmp
# 確認運行時線程數
export OMP_NUM_THREADS=8症狀:MemoryError 或程序被殺
解決:使用索引文件避免重複建構
# 先建構並保存索引
python3 comprehensive_bench.py ... --index-file sift.bin
# 後續測試重用索引(跳過 build)
python3 comprehensive_bench.py ... --index-file sift.bin# 細粒度掃描(找到精確的 Recall@10=95% 點)
--nprobe-list "10,12,14,16,18,20,22,24"
# 粗粒度掃描(快速探索)
--nprobe-list "1,8,64,512"# 對比不同 nlist
for nlist in 512 1024 2048; do
python3 comprehensive_bench.py \
... \
--nlist $nlist \
--output-dir results_nlist${nlist}
done
# 統一繪圖對比
python3 plot_tradeoff.py results_nlist*/*.jsoncomprehensive_bench.py --help- 完整參數說明plot_tradeoff.py --help- 繪圖工具說明ivf-bench.py- 簡化版測試(向下兼容)
完成評估前確認:
- SIFT1M 數據集已下載
- GIST1M 數據集已下載(可選)
- 已安裝 psutil, matplotlib
- 生成了 Recall-QPS 曲線圖
- 確認 Recall@10 ≥ 95% 在合理的 QPS 下達成
- 記錄了 OpenMP 加速比
- 保存了所有 JSON 結果文件