Skip to content

feat(migrate): [7/7] Performance benchmarks#566

Closed
nkanu17 wants to merge 10 commits intofeat/migrate-cli-docsfrom
feat/migrate-benchmarks
Closed

feat(migrate): [7/7] Performance benchmarks#566
nkanu17 wants to merge 10 commits intofeat/migrate-cli-docsfrom
feat/migrate-benchmarks

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 1, 2026

Summary

Adds a benchmarking suite for the Index Migrator, measuring throughput, latency, memory usage, and index size across different migration scenarios (field changes, vector algorithm changes, quantization, key prefix changes, and mixed operations).

Usage

# Run the benchmarks (requires a running Redis instance)
python tests/benchmarks/migration_benchmark.py

# Run the real-world benchmark with larger datasets
python tests/benchmarks/index_migrator_real_benchmark.py

# Generate benchmark reports
rvl migrate apply --plan migration_plan.yaml --benchmark-out benchmark_report.yaml

What is included

  • Benchmark runner (tests/benchmarks/migration_benchmark.py): Comprehensive benchmark suite that measures:
    • Keys-per-second throughput for each migration step
    • End-to-end migration latency
    • Memory usage before and after migration
    • Index size changes
    • Downtime duration
  • Real-world benchmark (tests/benchmarks/index_migrator_real_benchmark.py): Benchmark with realistic data volumes and schema complexity.
  • Benchmark utilities: Report generation and comparison helpers.
  • Conftest updates (tests/benchmarks/conftest.py): Pytest fixtures for benchmark Redis instances and data setup.

Details

Benchmark scenarios covered:

  • Simple field add/remove: Measures overhead of non-vector schema changes
  • Vector algorithm change: FLAT to HNSW, HNSW to SVS-VAMANA
  • Quantization: float32 to float16, float32 to int8
  • Key prefix change: Measures bulk key rename throughput
  • Mixed operations: Combined field changes + algorithm change + quantization
  • Scaling: 1K, 10K, 100K, 1M key counts (configurable)

Results are written as YAML reports for comparison across runs.

PR Stack

PR Branch Description
#560 feat/migrate-core Core planner, validator, sync executor, and basic CLI
#561 feat/migrate-executor Quantization reliability layer
#562 feat/migrate-async Async planner, executor, and validator
#563 feat/migrate-batch Batch migration with multi-index support
#564 feat/migrate-wizard Interactive migration wizard
#565 feat/migrate-cli-docs Full CLI commands and documentation
#566 (this) feat/migrate-benchmarks Performance benchmarks

Copilot AI review requested due to automatic review settings April 1, 2026 22:39
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 1, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds benchmark and visualization tooling under tests/benchmarks/ to measure migrator and retrieval performance (throughput/latency/memory) across multiple index configurations and dataset scales, plus scripts to render result charts.

Changes:

  • Add an end-to-end migration timing benchmark (sync + async) with index-size scaling and memory reporting.
  • Add retrieval benchmarks comparing FLAT vs HNSW and FP32 vs FP16, including overlap/recall-style metrics.
  • Add matplotlib-based result visualization for benchmark JSON outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
tests/benchmarks/visualize_results.py New charting script to generate memory/precision/QPS/latency/recall plots from benchmark JSON results.
tests/benchmarks/retrieval_benchmark.py New retrieval benchmark runner to build/query multiple index configs and output structured metrics.
tests/benchmarks/migration_benchmark.py New migration benchmark runner to populate synthetic corpora and time migrations (sync/async), including timing breakdowns.
tests/benchmarks/index_migrator_real_benchmark.py New “real dataset” benchmark (AG News + sentence-transformers) to measure migration + query quality/memory on realistic text data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Fix SentenceTransformer local_files_only fallback for first run
- Remove unused imports (string, struct)
- Fix dead store enum_s (prefix with _)
- Guard ZeroDivisionError in chart_memory_savings
- Fix compute_recall to return 0.0 for empty ground truth
- Wrap logger handler in try/finally to prevent leak on exception
nkanu17 added a commit that referenced this pull request Apr 2, 2026
Scoped cleanup to current database only to avoid affecting
other databases on shared Redis instances.
@nkanu17 nkanu17 force-pushed the feat/migrate-cli-docs branch from 751a0e5 to 49ce997 Compare April 2, 2026 00:30
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20)
- Extract duplicated progress_callback into shared _make_progress_callback
  method in migrate.py (#565 #21)
- Add rvl migrate section to CLI reference docs (#565 #19)
- Add try/except for matplotlib import in visualize_results.py (#566 #14)
@nkanu17 nkanu17 force-pushed the feat/migrate-benchmarks branch from 99c262d to 26ce024 Compare April 2, 2026 00:30
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ce0246e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Fix SentenceTransformer local_files_only fallback for first run
- Remove unused imports (string, struct)
- Fix dead store enum_s (prefix with _)
- Guard ZeroDivisionError in chart_memory_savings
- Fix compute_recall to return 0.0 for empty ground truth
- Wrap logger handler in try/finally to prevent leak on exception
@nkanu17 nkanu17 force-pushed the feat/migrate-cli-docs branch from 49ce997 to a1d54c9 Compare April 2, 2026 03:58
nkanu17 added a commit that referenced this pull request Apr 2, 2026
Scoped cleanup to current database only to avoid affecting
other databases on shared Redis instances.
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20)
- Extract duplicated progress_callback into shared _make_progress_callback
  method in migrate.py (#565 #21)
- Add rvl migrate section to CLI reference docs (#565 #19)
- Add try/except for matplotlib import in visualize_results.py (#566 #14)
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Clean up source index on failure in benchmark finally block
- Include queried_top_k in latency metrics for clarity when effective_top_k differs
Copilot AI review requested due to automatic review settings April 2, 2026 03:58
@nkanu17 nkanu17 force-pushed the feat/migrate-benchmarks branch from 07a1312 to 00a11fb Compare April 2, 2026 15:09
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Clean up source index on failure in benchmark finally block
- Include queried_top_k in latency metrics for clarity when effective_top_k differs
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- Skip CSV header row if present (detect non-digit first column)
- Handle short CSV rows gracefully
- Switch flushall to flushdb to avoid wiping other databases
nkanu17 added a commit that referenced this pull request Apr 2, 2026
- plan_path pointed to deleted TemporaryDirectory, removed from results dict
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00a11fb6d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ground_truth_results,
result["result_sets"],
k_values=valid_k_values,
ground_truth_depth=top_k,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute recall against full ground-truth depth

The recall curve is currently computed with ground_truth_depth=top_k, which is 10 by default, even though queries are fetched up to recall_k_max (default 100). This makes recall@20/50/100 measure recovery of only the top-10 neighbors, so curves saturate early and can materially overstate ANN quality at larger K values. In practical runs with default flags, this misreports the intended recall-vs-K behavior and can lead to incorrect benchmarking conclusions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

continue
if len(records) >= required_docs:
break
label, title, description = row
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV row unpacking crashes on more than three columns

Low Severity

The guard on line 150 only skips rows with fewer than 3 columns (len(row) < 3), but label, title, description = row on line 157 will raise ValueError: too many values to unpack if a CSV row produces more than 3 fields. This can happen with non-standard AG News CSV files that have extra columns or malformed quoting. Using row[0], row[1], row[2] or starred unpacking would be more robust.

Fix in Cursor Fix in Web

nkanu17 added 10 commits April 2, 2026 11:29
…hput, latency, memory measurement, and visualization

Benchmark scripts for measuring migration performance across different
index sizes and configurations. Includes real-world AG News dataset
benchmarks, synthetic end-to-end timing benchmarks with memory tracking,
retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and
matplotlib visualization tooling for benchmark results.
- Fix SentenceTransformer local_files_only fallback for first run
- Remove unused imports (string, struct)
- Fix dead store enum_s (prefix with _)
- Guard ZeroDivisionError in chart_memory_savings
- Fix compute_recall to return 0.0 for empty ground truth
- Wrap logger handler in try/finally to prevent leak on exception
Scoped cleanup to current database only to avoid affecting
other databases on shared Redis instances.
- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20)
- Extract duplicated progress_callback into shared _make_progress_callback
  method in migrate.py (#565 #21)
- Add rvl migrate section to CLI reference docs (#565 #19)
- Add try/except for matplotlib import in visualize_results.py (#566 #14)
- Clean up source index on failure in benchmark finally block
- Include queried_top_k in latency metrics for clarity when effective_top_k differs
- Skip CSV header row if present (detect non-digit first column)
- Handle short CSV rows gracefully
- Switch flushall to flushdb to avoid wiping other databases
- plan_path pointed to deleted TemporaryDirectory, removed from results dict
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

Closing in favor of restructured PR stack (Option A ordering).

@nkanu17 nkanu17 closed this Apr 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78ca86567e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +477 to +480
# Use the larger of top_k and recall_k_max for querying
# so we have enough results for recall curve computation
effective_top_k = max(top_k, recall_k_max)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Measure latency/QPS with requested top_k

benchmark_scale inflates top_k to max(top_k, recall_k_max) and uses that value for all timed queries, so with defaults (top_k=10, recall_k_max=100) the reported p50/p95/QPS are actually for top-100 retrieval, not top-10. This skews the main throughput/latency metrics and makes comparisons across runs sensitive to recall settings rather than the requested query depth.

Useful? React with 👍 / 👎.

{
"doc_id": f"ag-news-{len(records)}",
"text": text,
"label": AG_NEWS_LABELS[int(label) - 1],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept both 0-based and 1-based CSV labels

load_ag_news_records_from_csv always subtracts 1 from the CSV label before indexing AG_NEWS_LABELS, which crashes for common AG News exports that already use 0-3 labels (e.g., Hugging Face-style dumps). Because --dataset-csv is documented as a generic local AG News input path, this makes valid inputs fail with KeyError before benchmarking starts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants