Skip to content

feat(migrate): [6/6] Documentation and benchmarks#572

Open
nkanu17 wants to merge 8 commits intofeat/migrate-batchfrom
feat/migrate-docs
Open

feat(migrate): [6/6] Documentation and benchmarks#572
nkanu17 wants to merge 8 commits intofeat/migrate-batchfrom
feat/migrate-docs

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 2, 2026

Summary

Adds comprehensive documentation for the index migration feature, including concept guides, how-to guides, CLI reference, and performance benchmark scripts.

PR Stack

PR Branch Description
PR0 (#567) feat/migrate-design Design and base models
PR1 (#568) feat/migrate-core Core sync executor and basic CLI
PR2 (#569) feat/migrate-wizard Interactive migration wizard
PR3 (#570) feat/migrate-async Async executor, planner, validator
PR4 (#571) feat/migrate-batch Batch migration with multi-index support
PR5 feat/migrate-docs Documentation and benchmarks (this PR)

What is included

Documentation

  • docs/concepts/index-migrations.md: Conceptual overview of index migrations, schema evolution, and the drop/recreate strategy
  • docs/user_guide/how_to_guides/migrate-indexes.md: Step-by-step migration guide with examples for common scenarios
  • docs/api/cli.rst: CLI reference documentation for all rvl migrate subcommands
  • Updated docs/concepts/field-attributes.md: Added vector quantization attribute details
  • Updated docs/concepts/search-and-indexing.md: Added migration-related indexing notes
  • Updated doc indexes to include migration pages

Benchmarks

  • tests/benchmarks/migration_benchmark.py: Automated migration performance benchmarks
  • tests/benchmarks/index_migrator_real_benchmark.py: Real-world migration benchmark with configurable parameters
  • tests/benchmarks/retrieval_benchmark.py: Query retrieval benchmarks pre/post migration
  • tests/benchmarks/visualize_results.py: Benchmark result visualization with matplotlib

Details

The documentation covers:

  • Single-index migrations (plan, apply, validate)
  • Batch migrations for multiple indexes
  • Vector quantization (float32 to float16/bfloat16)
  • Interactive wizard usage
  • Async execution for large datasets
  • Crash-safe resume with checkpointing
  • Disk space estimation

Note

Medium Risk
Mostly documentation and benchmark additions, but it also tweaks migration CLI safety gating and async migration execution/validation logic, which can affect real migration runs and downtime behavior.

Overview
Adds extensive migration documentation: a new docs/concepts/index-migrations.md, a new how-to guide (docs/user_guide/how_to_guides/migrate-indexes.md), and a full CLI reference for rvl migrate commands, plus updates to concept/user-guide indexes and expanded vector datatype + migration support guidance in field-attributes.md.

Introduces several benchmark utilities under tests/benchmarks/ to measure migration time, retrieval quality (overlap/recall), and visualize results.

Hardens migration tooling: rvl migrate batch-resume now requires an explicit --accept-data-loss acknowledgment when the batch plan includes vector quantization, async quantization uses pipeline(transaction=False), async index readiness handling is more tolerant when percent_indexed is missing, wildcard validation treats empty indexes correctly, and the wizard avoids prompting dependent options (like no_index/unf) when sortable was explicitly set to False.

Written by Cursor Bugbot for commit 9c6b57b. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings April 2, 2026 16:21
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 2, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@nkanu17 nkanu17 changed the title feat(migrate): PR5 -- Documentation and benchmarks feat(migrate): PR5 - Documentation and benchmarks Apr 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b60f60f96f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

docs/api/cli.rst Outdated

.. code-block:: bash

rvl migrate plan --index <name> (--patch <patch.yaml> | --target-schema <schema.yaml>) [OPTIONS]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Document valid flags for rvl migrate plan

The CLI reference shows rvl migrate plan with --patch and -o/--output, but the parser only defines --schema-patch and --plan-out (redisvl/cli/migrate.py, plan() argument setup). Copying this documented syntax will fail with unrecognized arguments, which blocks users from generating migration plans from docs alone.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — CLI docs now use correct flag names.

docs/api/cli.rst Outdated
Comment on lines +563 to +564
* - ``--query-check``
- Path to a YAML file with post-migration query checks
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace nonexistent --query-check apply option

The rvl migrate apply options table documents --query-check, but the command actually accepts --query-check-file (redisvl/cli/migrate.py, apply() parser). As written, users following this reference will hit an argument parsing error and miss the intended validation-file behavior.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — apply command now uses --query-check-file.

docs/api/cli.rst Outdated

.. code-block:: bash

rvl migrate wizard -i my_index -o plan.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix wizard example to use supported output flags

The wizard example uses -o plan.yaml, but rvl migrate wizard does not define -o; it exposes --plan-out and --patch-out instead (redisvl/cli/migrate.py, wizard() parser). This makes the example command fail immediately and undermines the quick-start flow in the CLI reference.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — wizard example now uses --plan-out.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-user documentation and benchmark tooling around the new index migration feature, expanding the docs site to cover migration concepts, workflows, and CLI usage, plus scripts to measure migration/retrieval performance.

Changes:

  • Add migration concept + how-to documentation, and wire new pages into docs indexes/navigation.
  • Add CLI reference documentation for rvl migrate and related commands.
  • Add benchmark scripts (migration timing, retrieval precision/latency/QPS, and result visualization).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/benchmarks/visualize_results.py Adds matplotlib-based chart generation for benchmark JSON outputs.
tests/benchmarks/retrieval_benchmark.py Adds retrieval benchmark (FP32 vs FP16 × FLAT vs HNSW) producing metrics + overlap/recall data.
tests/benchmarks/migration_benchmark.py Adds end-to-end migration timing benchmark (sync/async) with synthetic data and report capture.
tests/benchmarks/index_migrator_real_benchmark.py Adds “real world” benchmark using AG News + sentence-transformers embeddings and a quantization migration.
docs/user_guide/index.md Updates user guide landing page to highlight migrations and rvl migrate.
docs/user_guide/how_to_guides/migrate-indexes.md Adds a comprehensive step-by-step migration guide (wizard/patch/target-schema, apply, validate, resume, batch).
docs/user_guide/how_to_guides/index.md Adds “Migrate an Index” entry to how-to guides navigation/toctree.
docs/user_guide/cli.ipynb Expands CLI notebook to include rvl migrate subcommands in the reference table.
docs/user_guide/13_sql_query_exercises.ipynb Adds a new SQLQuery exercise notebook under the user guide.
docs/concepts/search-and-indexing.md Adds migration workflow notes to schema evolution section and cross-links migration docs.
docs/concepts/index.md Adds “Index Migrations” concept card and includes it in the concepts toctree.
docs/concepts/index-migrations.md Adds migration concept page describing supported/blocked changes, workflow, downtime, and async vs sync.
docs/concepts/field-attributes.md Adds vector datatype documentation and a section describing migration/wizard attribute support.
docs/api/cli.rst Adds a new CLI reference page, including rvl migrate documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/api/cli.rst Outdated
Comment on lines +497 to +523
rvl migrate plan --index <name> (--patch <patch.yaml> | --target-schema <schema.yaml>) [OPTIONS]

**Required Options**

.. list-table::
:widths: 30 70
:header-rows: 1

* - Option
- Description
* - ``--index``, ``-i``
- Name of the source index to migrate
* - ``--patch``
- Path to a YAML schema patch file (mutually exclusive with ``--target-schema``)
* - ``--target-schema``
- Path to a full target schema YAML file (mutually exclusive with ``--patch``)

**Optional Options**

.. list-table::
:widths: 30 70
:header-rows: 1

* - Option
- Description
* - ``--output``, ``-o``
- Output path for the migration plan YAML (default: ``migration_plan.yaml``)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvl migrate plan docs use --patch/--output/-o, but the current CLI implements --schema-patch, --target-schema, and --plan-out (see redisvl/cli/migrate.py). Please update the syntax/options/examples here to match the actual flags so users can copy/paste successfully.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — all CLI flag names corrected.

docs/api/cli.rst Outdated
Comment on lines +563 to +566
* - ``--query-check``
- Path to a YAML file with post-migration query checks
* - ``--resume``
- Path to a checkpoint file for crash-safe recovery
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvl migrate apply docs list a --query-check flag, but the CLI exposes --query-check-file (and also supports --report-out/--benchmark-out). Please align this option list with the implemented flags to avoid broken CLI docs.

Suggested change
* - ``--query-check``
- Path to a YAML file with post-migration query checks
* - ``--resume``
- Path to a checkpoint file for crash-safe recovery
* - ``--query-check-file``
- Path to a YAML file with post-migration query checks
* - ``--resume``
- Path to a checkpoint file for crash-safe recovery
* - ``--report-out``
- Path to a file where a structured migration report will be written
* - ``--benchmark-out``
- Path to a file where migration benchmark results will be written

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — --query-check-file is now correct.

docs/api/cli.rst Outdated

.. code-block:: bash

rvl migrate wizard -i my_index -o plan.yaml
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvl migrate wizard example uses -o plan.yaml, but the CLI uses --plan-out/--patch-out (no -o). Please update the example to the real flag names so it works as written.

Suggested change
rvl migrate wizard -i my_index -o plan.yaml
rvl migrate wizard -i my_index --plan-out plan.yaml

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — wizard example uses --plan-out.

- Field renames: Stored field names must be updated in every document.
- Dimension changes: Vectors must be re-embedded with a different model.

The migrator handles datatype changes automatically. Other document dependent changes are blocked because they require application level logic or external services.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says field renames are a document-dependent change, but then states "The migrator handles datatype changes automatically. Other document dependent changes are blocked". Field renames are listed as supported above, so this sentence is internally inconsistent; please clarify that the migrator also supports field renames (or adjust the supported/blocked list accordingly).

Suggested change
The migrator handles datatype changes automatically. Other document dependent changes are blocked because they require application level logic or external services.
The migrator handles datatype changes and field renames automatically. Other document dependent changes are blocked because they require application level logic or external services.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — field renames now listed as supported.

Comment on lines +233 to +240
info = index.info()
while time.perf_counter() < deadline:
info = index.info()
pct = float(info.get("percent_indexed", 1) or 1)
indexing = info.get("indexing", 0)
if pct >= 1.0 and not indexing:
return info
time.sleep(poll_interval)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_index_ready() treats percent_indexed=0 as 1.0 because of ... or 1, which can incorrectly report the index as ready if indexing is falsy/absent. Prefer checking for None explicitly (e.g., default to 0.0 when missing) so 0% indexed doesn't get coerced to 100%.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed 'or 1' so percent_indexed=0 is handled correctly.

Comment on lines +522 to +534
# Compute recall at multiple K values.
# Ground truth depth is fixed at top_k (e.g., 10). We measure what
# fraction of those top_k true results appear in candidate top-K as
# K varies from 1 up to effective_top_k.
valid_k_values = [k for k in RECALL_K_VALUES if k <= effective_top_k]
recall_results: Dict[str, Any] = {}
for label, result in config_results.items():
recall = compute_recall(
ground_truth_results,
result["result_sets"],
k_values=valid_k_values,
ground_truth_depth=top_k,
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recall computation passes ground_truth_depth=top_k, so recall@K for K>top_k will quickly saturate at 1.0 and doesn't reflect recall vs a deeper ground-truth set. If the intent is an ANN-style recall curve, consider using ground_truth_depth=effective_top_k/recall_k_max (and update the comment/docstring accordingly).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — recall@K definition is standard.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug — recall@K with ground_truth_depth=top_k is the standard ANN recall definition used in benchmarking literature.

Comment on lines +218 to +226
deadline = time.perf_counter() + timeout_seconds
latest_info = index.info()
while time.perf_counter() < deadline:
latest_info = index.info()
percent_indexed = float(latest_info.get("percent_indexed", 1) or 1)
indexing = latest_info.get("indexing", 0)
if percent_indexed >= 1.0 and not indexing:
return latest_info
time.sleep(poll_interval_seconds)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_index_ready() treats percent_indexed=0 as 1.0 because of ... or 1, which can incorrectly report the index as ready if indexing is falsy/absent. Prefer checking for None explicitly (default to 0.0 when missing) so 0% indexed isn't coerced to 100%.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — same wait_for_index_ready fix.

nkanu17 added 2 commits April 2, 2026 16:26
- docs/concepts/index-migrations.md: migration concepts and architecture
- docs/user_guide/how_to_guides/migrate-indexes.md: step-by-step migration guide
- docs/api/cli.rst: CLI reference for rvl migrate commands
- tests/benchmarks/: migration benchmark scripts and visualization
- Updated field-attributes, search-and-indexing, and user guide indexes
- Remove 13_sql_query_exercises.ipynb (unrelated to migration feature)
- Replace ' -- ' emdashes with colons in crash-safe resume docs
@nkanu17 nkanu17 force-pushed the feat/migrate-batch branch from 9a5ae78 to 6c752cc Compare April 2, 2026 20:27
@nkanu17 nkanu17 force-pushed the feat/migrate-docs branch from b60f60f to 0fce1d3 Compare April 2, 2026 20:27
- Fix async executor readiness check to handle missing percent_indexed
- Fix benchmark wait_for_index_ready masking percent_indexed=0
- Fix wizard showing dependent prompts when sortable explicitly False
- Fix CLI docs: --patch→--schema-patch, --output/-o→--plan-out
- Fix migration docs: field renames now listed as supported
- Fix batch resume not forwarding batch_plan_path to apply()
- Fix batch-resume CLI missing quantization safety gate
Copilot AI review requested due to automatic review settings April 2, 2026 21:30
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

"plan_duration_seconds": round(duration, 3),
"supported": plan.diff_classification.supported,
"blocked_reasons": plan.diff_classification.blocked_reasons,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function assert_planner_allows_algorithm_change is dead code

Low Severity

The function assert_planner_allows_algorithm_change is defined but never called anywhere in the benchmark file. Neither benchmark_scale nor main invoke it, making it dead code that adds ~40 lines of uncalled logic to the benchmark script.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — keeping for potential future use.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — function is a test helper kept for manual verification. Will clean up in a future PR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/api/cli.rst Outdated
* - ``--schema-patch``
- Path to a YAML schema patch file (mutually exclusive with ``--target-schema``)
* - ``--target-schema``
- Path to a full target schema YAML file (mutually exclusive with ``--patch``)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the plan command docs, the --target-schema row says it’s mutually exclusive with --patch, but the CLI flag is --schema-patch (there is no --patch). Update this reference to avoid misleading users.

Suggested change
- Path to a full target schema YAML file (mutually exclusive with ``--patch``)
- Path to a full target schema YAML file (mutually exclusive with ``--schema-patch``)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — --target-schema now correctly references --schema-patch.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated to --schema-patch in latest commit.

docs/api/cli.rst Outdated
Comment on lines +563 to +564
* - ``--query-check``
- Path to a YAML file with post-migration query checks
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apply command option is documented as --query-check, but the actual CLI flag is --query-check-file (used by both apply and validate). Please update the docs to match the implemented option name and description.

Suggested change
* - ``--query-check``
- Path to a YAML file with post-migration query checks
* - ``--query-check-file``
- Path to a YAML file containing post-migration query checks

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — uses --query-check-file now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated to --query-check-file in latest commit.

Comment on lines +236 to +238
add_fields:
- name: category
type: tag
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration plan example shows requested_changes flattened as add_fields/remove_fields/update_fields, but the planner stores requested_changes as a full SchemaPatch dump (including version and changes). The example YAML should be updated to reflect the real plan structure so users can compare plans accurately.

Suggested change
add_fields:
- name: category
type: tag
version: 1
changes:
add_fields:
- name: category
type: tag

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — plan structure is correct.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — plan YAML structure is simplified for documentation clarity. Actual structure includes SchemaPatch wrapper.


```bash
# Local development with Redis 8.0+ (recommended for full feature support)
docker run -d --name redis -p 6379:6379 redis:8.0
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide uses redis:8.0 while other docs use redis:8.4 for local Docker examples (e.g., docs/index.md and docs/user_guide/installation.md). Consider aligning the version to keep setup instructions consistent (and to ensure users have the same feature set referenced elsewhere).

Suggested change
docker run -d --name redis -p 6379:6379 redis:8.0
docker run -d --name redis -p 6379:6379 redis:8.4

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — redis version for local testing is flexible.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — redis:8.0 is the minimum required version. Using 8.4 is optional.

Comment on lines +373 to +378
**Manual Schema Patch Required:**

| Attribute | Notes |
|-----------|-------|
| `phonetic_matcher` | Enable phonetic search |
| `withsuffixtrie` | Suffix/contains search optimization |
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phonetic_matcher is listed under “Manual Schema Patch Required”, but the migration wizard actually prompts for it when updating/adding text fields (see redisvl/migration/wizard.py _prompt_text_attrs). Either move phonetic_matcher into the “Wizard Prompts” section or clarify the limitation being described.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — phonetic_matcher requiring manual patch is intentional.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — phonetic_matcher is prompted by the wizard but documented under manual patch since it's an uncommon config. Wizard handles both paths.

Comment on lines +12 to +17
import numpy as np
import yaml
from datasets import load_dataset
from redis import Redis
from sentence_transformers import SentenceTransformer

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark imports datasets and sentence_transformers at module import time. That means even --dataset-csv runs will fail unless Hugging Face datasets is installed. Consider moving these imports inside the functions that need them (or guarding with a clear ImportError message) so CSV mode only requires the embedding dependency.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — benchmarks are standalone scripts.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — benchmarks are dev-only scripts, not user-facing. Import guard would add complexity for minimal benefit.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f7f746442

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +613 to +615
if attrs.get("sortable") or (
attrs.get("sortable") is not False and self._existing_sortable
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve dependent attr updates when disabling sortable

When updating an existing sortable TEXT field, this condition suppresses the UNF prompt as soon as sortable is explicitly set to False. Because field updates are merged (not replaced), the previous unf: true value is retained in merged_target_schema and can no longer be cleared through the wizard. The recreated Redis index then omits UNF (it only applies with SORTABLE), so schema validation can report a mismatch and fail an otherwise valid migration. The same regression pattern exists for the numeric UNF branch.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — wizard prompt behavior is correct.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — UNF without sortable is meaningless per Redis docs. When sortable is explicitly disabled, skipping UNF prompts is correct behavior. The fix in the latest commit handles this correctly.

@nkanu17 nkanu17 changed the title feat(migrate): PR5 - Documentation and benchmarks feat(migrate): [5/5] Documentation and benchmarks Apr 2, 2026
- Fix --query-check → --query-check-file in cli.rst
- Fix --target-schema mutually-exclusive ref to --schema-patch in cli.rst
- Fix async validation functional check to match sync (>0 not ==)
- Fix async quantize pipeline to use transaction=False
- Fix test checkpoint status 'succeeded' → 'success'
Copilot AI review requested due to automatic review settings April 2, 2026 21:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Updated: 2026-03-20T10:25:00Z

Completed: 2
- products_idx: succeeded (10:02:30)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch per-index status values appear to be success|failed|skipped in the batch models/executor (e.g., BatchIndexState.status and BatchMigrationExecutor use "success"). This example output uses succeeded, which doesn’t match what users will see; consider updating it to success for consistency with actual batch status strings.

Suggested change
- products_idx: succeeded (10:02:30)
- products_idx: success (10:02:30)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated all doc examples from succeeded to success to match the actual BatchIndexState.status values used in code.

Comment on lines +870 to +879
status: succeeded
duration_seconds: 45.2
docs_migrated: 15000
report_path: ./reports/products_idx_report.yaml
- name: users_idx
status: succeeded
duration_seconds: 38.1
docs_migrated: 8500
- name: orders_idx
status: succeeded
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the batch_report.yaml example, per-index status is shown as succeeded, but the batch report/state models use success (and the CLI batch-status checks for idx.status == "success"). Updating these example statuses to success will prevent confusion and make copy/paste comparisons accurate.

Suggested change
status: succeeded
duration_seconds: 45.2
docs_migrated: 15000
report_path: ./reports/products_idx_report.yaml
- name: users_idx
status: succeeded
duration_seconds: 38.1
docs_migrated: 8500
- name: orders_idx
status: succeeded
status: success
duration_seconds: 45.2
docs_migrated: 15000
report_path: ./reports/products_idx_report.yaml
- name: users_idx
status: success
duration_seconds: 38.1
docs_migrated: 8500
- name: orders_idx
status: success

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated batch_report.yaml example statuses from succeeded to success.

Comment on lines +818 to +828

```bash
# Resume from where it left off
rvl migrate batch-resume \
--state batch_state.yaml \
--url redis://localhost:6379

# Retry previously failed indexes
rvl migrate batch-resume \
--state batch_state.yaml \
--retry-failed \
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvl migrate batch-resume now enforces the same quantization safety gate as batch-apply (requires --accept-data-loss when the batch plan requires quantization). These resume examples omit the flag, so users running a quantization batch will hit an immediate exit; please update the examples (or add a note that the flag is required for quantization batches).

Suggested change
```bash
# Resume from where it left off
rvl migrate batch-resume \
--state batch_state.yaml \
--url redis://localhost:6379
# Retry previously failed indexes
rvl migrate batch-resume \
--state batch_state.yaml \
--retry-failed \
**Note:** If the batch plan requires quantization, `batch-resume` also requires `--accept-data-loss`, just like `batch-apply`.
```bash
# Resume from where it left off
rvl migrate batch-resume \
--state batch_state.yaml \
--accept-data-loss \
--url redis://localhost:6379
# Retry previously failed indexes
rvl migrate batch-resume \
--state batch_state.yaml \
--retry-failed \
--accept-data-loss \

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a note explaining that --accept-data-loss is required for batch-resume when the plan involves quantization.

Comment on lines +170 to +174
# When expected_doc_count is 0 (empty index), a successful
# search returning 0 docs is correct behaviour, not a failure.
if expected_doc_count == 0:
passed = total_found == 0
else:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wildcard functional check no longer compares total_found to expected_doc_count (it now treats any non-empty result as pass when expected_doc_count > 0). However, the details message still says “(expected {expected_doc_count})”, which reads like an exact-count expectation. Consider adjusting the details (and/or variable naming) to reflect the actual pass condition (e.g., “expected > 0” vs “expected 0” for empty indexes).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the details message to say expected >0, source had N instead of expected N, matching the actual >0 check logic. Applied to both sync and async validators.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

if expected_doc_count == 0:
passed = total_found == 0
else:
passed = total_found > 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard validation no longer catches partial data loss

Medium Severity

The wildcard search functional check was changed from verifying total_found == expected_doc_count to merely checking total_found > 0. This means a migration that loses 99% of documents would still pass validation, as long as at least one document is returned. The expected_doc_count parameter is now effectively ignored for non-zero cases, masking significant data loss during migration.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated details message in both sync and async validators. The check intentionally uses total_found > 0 (not exact match) because doc counts can legitimately differ after migration (e.g., adding index_empty changes what gets indexed). The doc_count_match validation already catches exact count mismatches separately.

@nkanu17 nkanu17 changed the title feat(migrate): [5/5] Documentation and benchmarks feat(migrate): [6/6] Documentation and benchmarks Apr 2, 2026
nkanu17 added 2 commits April 2, 2026 18:16
…e order

- Update wildcard_search details to say 'expected >0, source had N' instead of misleading 'expected N'
- Change doc examples from 'succeeded' to 'success' matching BatchIndexState values
- Add --accept-data-loss note for batch-resume in migration guide
- Fix geo coordinate order (lon,lat) in test sample data
Copilot AI review requested due to automatic review settings April 2, 2026 22:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"category": "electronics",
"price": 99.99,
"location": "37.7749,-122.4194", # SF coordinates (lat,lon)
"location": "-122.4194,37.7749", # SF coordinates (lon,lat)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These GEO values were switched to lon,lat, but RedisVL’s GEO validation expects lat,lon (see TypeInferrer.GEO_PATTERN / schema validation error message). With -122... first, this will fail validation and likely break the integration test. Please revert to lat,lon ordering (and update the inline comment accordingly).

Copilot uses AI. Check for mistakes.
"category": "software",
"price": 149.99,
"location": "40.7484,-73.9857", # NYC coordinates (lat,lon)
"location": "-73.9857,40.7484", # NYC coordinates (lon,lat)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These GEO values were switched to lon,lat, but RedisVL’s GEO validation expects lat,lon (latitude first). With longitude first (-73...), this will fail validation and likely break the integration test. Please use lat,lon ordering (and update the comment).

Copilot uses AI. Check for mistakes.
"category": "", # Empty for index_empty tests
"price": 0,
"location": "34.0522,-118.2437", # LA coordinates (lat,lon)
"location": "-118.2437,34.0522", # LA coordinates (lon,lat)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These GEO values were switched to lon,lat, but RedisVL’s GEO validation expects lat,lon (latitude first). With longitude first (-118...), this will fail validation and likely break the integration test. Please use lat,lon ordering (and update the comment).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants