feat(migrate): [7/7] Performance benchmarks by nkanu17 · Pull Request #566 · redis/redis-vl-python

nkanu17 · 2026-04-01T22:39:15Z

Summary

Adds a benchmarking suite for the Index Migrator, measuring throughput, latency, memory usage, and index size across different migration scenarios (field changes, vector algorithm changes, quantization, key prefix changes, and mixed operations).

Usage

# Run the benchmarks (requires a running Redis instance)
python tests/benchmarks/migration_benchmark.py

# Run the real-world benchmark with larger datasets
python tests/benchmarks/index_migrator_real_benchmark.py

# Generate benchmark reports
rvl migrate apply --plan migration_plan.yaml --benchmark-out benchmark_report.yaml

What is included

Benchmark runner (tests/benchmarks/migration_benchmark.py): Comprehensive benchmark suite that measures:
- Keys-per-second throughput for each migration step
- End-to-end migration latency
- Memory usage before and after migration
- Index size changes
- Downtime duration
Real-world benchmark (tests/benchmarks/index_migrator_real_benchmark.py): Benchmark with realistic data volumes and schema complexity.
Benchmark utilities: Report generation and comparison helpers.
Conftest updates (tests/benchmarks/conftest.py): Pytest fixtures for benchmark Redis instances and data setup.

Details

Benchmark scenarios covered:

Simple field add/remove: Measures overhead of non-vector schema changes
Vector algorithm change: FLAT to HNSW, HNSW to SVS-VAMANA
Quantization: float32 to float16, float32 to int8
Key prefix change: Measures bulk key rename throughput
Mixed operations: Combined field changes + algorithm change + quantization
Scaling: 1K, 10K, 100K, 1M key counts (configurable)

Results are written as YAML reports for comparison across runs.

PR Stack

PR	Branch	Description
#560	`feat/migrate-core`	Core planner, validator, sync executor, and basic CLI
#561	`feat/migrate-executor`	Quantization reliability layer
#562	`feat/migrate-async`	Async planner, executor, and validator
#563	`feat/migrate-batch`	Batch migration with multi-index support
#564	`feat/migrate-wizard`	Interactive migration wizard
#565	`feat/migrate-cli-docs`	Full CLI commands and documentation
#566 (this)	`feat/migrate-benchmarks`	Performance benchmarks

jit-ci · 2026-04-01T22:40:49Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/migration_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

Copilot

Pull request overview

Adds benchmark and visualization tooling under tests/benchmarks/ to measure migrator and retrieval performance (throughput/latency/memory) across multiple index configurations and dataset scales, plus scripts to render result charts.

Changes:

Add an end-to-end migration timing benchmark (sync + async) with index-size scaling and memory reporting.
Add retrieval benchmarks comparing FLAT vs HNSW and FP32 vs FP16, including overlap/recall-style metrics.
Add matplotlib-based result visualization for benchmark JSON outputs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File	Description
tests/benchmarks/visualize_results.py	New charting script to generate memory/precision/QPS/latency/recall plots from benchmark JSON results.
tests/benchmarks/retrieval_benchmark.py	New retrieval benchmark runner to build/query multiple index configs and output structured metrics.
tests/benchmarks/migration_benchmark.py	New migration benchmark runner to populate synthetic corpora and time migrations (sync/async), including timing breakdowns.
tests/benchmarks/index_migrator_real_benchmark.py	New “real dataset” benchmark (AG News + sentence-transformers) to measure migration + query quality/memory on realistic text data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/migration_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

nkanu17 · 2026-04-01T22:43:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/migration_benchmark.py

nkanu17 · 2026-04-01T22:58:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99c262d1de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/retrieval_benchmark.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/benchmarks/index_migrator_real_benchmark.py

tests/benchmarks/visualize_results.py

tests/benchmarks/retrieval_benchmark.py

- Fix SentenceTransformer local_files_only fallback for first run - Remove unused imports (string, struct) - Fix dead store enum_s (prefix with _) - Guard ZeroDivisionError in chart_memory_savings - Fix compute_recall to return 0.0 for empty ground truth - Wrap logger handler in try/finally to prevent leak on exception

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20) - Extract duplicated progress_callback into shared _make_progress_callback method in migrate.py (#565 #21) - Add rvl migrate section to CLI reference docs (#565 #19) - Add try/except for matplotlib import in visualize_results.py (#566 #14)

nkanu17 · 2026-04-02T00:32:25Z

@codex review

tests/benchmarks/index_migrator_real_benchmark.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ce0246e7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tests/benchmarks/retrieval_benchmark.py

- Fix SentenceTransformer local_files_only fallback for first run - Remove unused imports (string, struct) - Fix dead store enum_s (prefix with _) - Guard ZeroDivisionError in chart_memory_savings - Fix compute_recall to return 0.0 for empty ground truth - Wrap logger handler in try/finally to prevent leak on exception

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20) - Extract duplicated progress_callback into shared _make_progress_callback method in migrate.py (#565 #21) - Add rvl migrate section to CLI reference docs (#565 #19) - Add try/except for matplotlib import in visualize_results.py (#566 #14)

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

- Skip CSV header row if present (detect non-digit first column) - Handle short CSV rows gracefully - Switch flushall to flushdb to avoid wiping other databases

- plan_path pointed to deleted TemporaryDirectory, removed from results dict

nkanu17 · 2026-04-02T15:10:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00a11fb6d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T15:14:57Z

tests/benchmarks/retrieval_benchmark.py

+            ground_truth_results,
+            result["result_sets"],
+            k_values=valid_k_values,
+            ground_truth_depth=top_k,


Compute recall against full ground-truth depth

The recall curve is currently computed with ground_truth_depth=top_k, which is 10 by default, even though queries are fetched up to recall_k_max (default 100). This makes recall@20/50/100 measure recovery of only the top-10 neighbors, so curves saturate early and can materially overstate ANN quality at larger K values. In practical runs with default flags, this misreports the intended recall-vs-K behavior and can lead to incorrect benchmarking conclusions.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-04-02T15:17:17Z

tests/benchmarks/index_migrator_real_benchmark.py

+                continue
+            if len(records) >= required_docs:
+                break
+            label, title, description = row


CSV row unpacking crashes on more than three columns

Low Severity

The guard on line 150 only skips rows with fewer than 3 columns (len(row) < 3), but label, title, description = row on line 157 will raise ValueError: too many values to unpack if a CSV row produces more than 3 fields. This can happen with non-standard AG News CSV files that have extra columns or malformed quoting. Using row[0], row[1], row[2] or starred unpacking would be more robust.

…hput, latency, memory measurement, and visualization Benchmark scripts for measuring migration performance across different index sizes and configurations. Includes real-world AG News dataset benchmarks, synthetic end-to-end timing benchmarks with memory tracking, retrieval quality comparisons (FLAT vs HNSW, FP32 vs FP16), and matplotlib visualization tooling for benchmark results.

- Fix SentenceTransformer local_files_only fallback for first run - Remove unused imports (string, struct) - Fix dead store enum_s (prefix with _) - Guard ZeroDivisionError in chart_memory_savings - Fix compute_recall to return 0.0 for empty ground truth - Wrap logger handler in try/finally to prevent leak on exception

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

- Remove leftover numbered list fragment in migrate-indexes.md (#565 #20) - Extract duplicated progress_callback into shared _make_progress_callback method in migrate.py (#565 #21) - Add rvl migrate section to CLI reference docs (#565 #19) - Add try/except for matplotlib import in visualize_results.py (#566 #14)

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

- Skip CSV header row if present (detect non-digit first column) - Handle short CSV rows gracefully - Switch flushall to flushdb to avoid wiping other databases

- plan_path pointed to deleted TemporaryDirectory, removed from results dict

nkanu17 · 2026-04-02T15:33:28Z

@codex review

nkanu17 · 2026-04-02T15:35:46Z

Closing in favor of restructured PR stack (Option A ordering).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78ca86567e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T15:36:49Z

tests/benchmarks/retrieval_benchmark.py

+    # Use the larger of top_k and recall_k_max for querying
+    # so we have enough results for recall curve computation
+    effective_top_k = max(top_k, recall_k_max)
+


Measure latency/QPS with requested top_k

benchmark_scale inflates top_k to max(top_k, recall_k_max) and uses that value for all timed queries, so with defaults (top_k=10, recall_k_max=100) the reported p50/p95/QPS are actually for top-100 retrieval, not top-10. This skews the main throughput/latency metrics and makes comparisons across runs sensitive to recall settings rather than the requested query depth.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-02T15:36:49Z

tests/benchmarks/index_migrator_real_benchmark.py

+                {
+                    "doc_id": f"ag-news-{len(records)}",
+                    "text": text,
+                    "label": AG_NEWS_LABELS[int(label) - 1],


Accept both 0-based and 1-based CSV labels

load_ag_news_records_from_csv always subtracts 1 from the CSV label before indexing AG_NEWS_LABELS, which crashes for common AG News exports that already use 0-3 labels (e.g., Hugging Face-style dumps). Because --dataset-csv is documented as a generic local AG News input path, this makes valid inputs fail with KeyError before benchmarking starts.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings April 1, 2026 22:39

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:39 View session

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Show resolved Hide resolved

cursor bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/migration_benchmark.py Outdated Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Show resolved Hide resolved

Copilot AI reviewed Apr 1, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

tests/benchmarks/index_migrator_real_benchmark.py Outdated Show resolved Hide resolved

nkanu17 requested a review from Copilot April 1, 2026 22:49

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:50 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nkanu17 requested a review from Copilot April 1, 2026 22:56

Copilot started reviewing on behalf of nkanu17 April 1, 2026 22:57 View session

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

tests/benchmarks/index_migrator_real_benchmark.py Outdated Show resolved Hide resolved

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: use flushdb instead of flushall in retrieval benchmark (#566)

23511cb

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

nkanu17 force-pushed the feat/migrate-cli-docs branch from 751a0e5 to 49ce997 Compare April 2, 2026 00:30

nkanu17 force-pushed the feat/migrate-benchmarks branch from 99c262d to 26ce024 Compare April 2, 2026 00:30

cursor bot reviewed Apr 2, 2026

View reviewed changes

tests/benchmarks/index_migrator_real_benchmark.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

tests/benchmarks/retrieval_benchmark.py Show resolved Hide resolved

nkanu17 force-pushed the feat/migrate-cli-docs branch from 49ce997 to a1d54c9 Compare April 2, 2026 03:58

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: use flushdb instead of flushall in retrieval benchmark (#566)

ae07a21

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: address review round 3 for migrate-benchmarks (#566)

5e686ce

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

Copilot AI review requested due to automatic review settings April 2, 2026 03:58

nkanu17 force-pushed the feat/migrate-benchmarks branch from 07a1312 to 00a11fb Compare April 2, 2026 15:09

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: address review round 3 for migrate-benchmarks (#566)

4e13993

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

nkanu17 added a commit that referenced this pull request Apr 2, 2026

fix: remove stale plan_path from benchmark results (#566)

c15b444

- plan_path pointed to deleted TemporaryDirectory, removed from results dict

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

cursor bot reviewed Apr 2, 2026

View reviewed changes

nkanu17 added 10 commits April 2, 2026 11:29

fix: use flushdb instead of flushall in retrieval benchmark (#566)

0f3a732

Scoped cleanup to current database only to avoid affecting other databases on shared Redis instances.

fix: address review round 3 for migrate-benchmarks (#566)

4c1964f

- Clean up source index on failure in benchmark finally block - Include queried_top_k in latency metrics for clarity when effective_top_k differs

style: auto-format after rebase cascade

e447702

fix: address review round 4 for migrate-benchmarks (#566)

1c68c29

- Skip CSV header row if present (detect non-digit first column) - Handle short CSV rows gracefully - Switch flushall to flushdb to avoid wiping other databases

style: auto-format after rebase cascade

56448ec

fix: remove stale plan_path from benchmark results (#566)

4fb38ed

- plan_path pointed to deleted TemporaryDirectory, removed from results dict

style: auto-format imports after rebase

78ca865

nkanu17 force-pushed the feat/migrate-cli-docs branch from e851d24 to 5d52713 Compare April 2, 2026 15:30

nkanu17 force-pushed the feat/migrate-benchmarks branch from 00a11fb to 78ca865 Compare April 2, 2026 15:30

nkanu17 changed the title ~~feat(migrator): [7/7] Migration performance benchmarks with throughput, latency, memory measurement, and visualization~~ feat(migrate): [7/7] Performance benchmarks Apr 2, 2026

nkanu17 closed this Apr 2, 2026

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

Conversation

nkanu17 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

What is included

Details

PR Stack

Uh oh!

jit-ci bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nkanu17 commented Apr 2, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nkanu17 commented Apr 1, 2026 •

edited

Loading

jit-ci bot commented Apr 1, 2026 •

edited

Loading