Skip to content

feat(migrate): [1/7] Core planner, validator, sync executor, and basic CLI#560

Closed
nkanu17 wants to merge 9 commits intomainfrom
feat/migrate-core
Closed

feat(migrate): [1/7] Core planner, validator, sync executor, and basic CLI#560
nkanu17 wants to merge 9 commits intomainfrom
feat/migrate-core

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 1, 2026

Summary

Adds the foundation for the Index Migrator feature: a schema-aware migration planner, validation engine, synchronous executor, and a basic CLI entry point (rvl migrate).

This PR is fully usable on its own for simple single-index migrations.

Usage

# List available indexes
rvl migrate list --url redis://localhost:6379

# Generate a migration plan
rvl migrate plan --index my_index --schema-patch patch.yaml

# Apply the migration (sync)
rvl migrate apply --plan migration_plan.yaml

# Validate the result
rvl migrate validate --plan migration_plan.yaml

# Estimate disk space
rvl migrate estimate --plan migration_plan.yaml

# Show guidance
rvl migrate helper

What is included

  • Models (redisvl/migration/models.py): Pydantic models for MigrationPlan, MigrationReport, DiffClassification, ValidationResult, and related types.
  • Planner (redisvl/migration/planner.py): MigrationPlanner that introspects a live Redis index, diffs it against a schema patch or target schema, classifies changes (supported vs blocked), and produces a serializable MigrationPlan.
  • Validator (redisvl/migration/validation.py): MigrationValidator that checks schema match, document count, key samples, and indexing failures after migration.
  • Executor (redisvl/migration/executor.py): MigrationExecutor (sync) with the 8-step migration pipeline: enumerate keys, BGSAVE, drop index, quantize vectors, rename fields, rename keys, create index, wait for re-indexing.
  • Utilities (redisvl/migration/utils.py): Shared helpers for YAML I/O, disk space estimation, AOF detection, index listing, report writing, and timestamp generation.
  • CLI (redisvl/cli/migrate.py): Basic Migrate class with helper, list, plan, apply, estimate, and validate subcommands.
  • CLI wiring (redisvl/cli/main.py, redisvl/cli/utils.py): Registers rvl migrate and adds add_redis_connection_options utility.
  • Unit tests (tests/unit/test_migration_planner.py): 15 tests covering planner logic, diff classification, and edge cases.
  • Integration tests (tests/integration/test_migration_comprehensive.py, tests/integration/test_migration_v1.py, tests/integration/test_migration_routes.py): End-to-end tests against a live Redis instance.

PR Stack

PR Branch Description
#560 (this) feat/migrate-core Core planner, validator, sync executor, and basic CLI
#561 feat/migrate-executor Quantization reliability layer
#562 feat/migrate-async Async planner, executor, and validator
#563 feat/migrate-batch Batch migration with multi-index support
#564 feat/migrate-wizard Interactive migration wizard
#565 feat/migrate-cli-docs Full CLI commands and documentation
#566 feat/migrate-benchmarks Performance benchmarks

…alidation, and shared utilities

Schema-aware migration planning infrastructure: Pydantic models for
plans, reports, schema patches, and field renames. Planner generates
migration plans by diffing source and target index schemas. Validator
checks post-migration state. Shared utilities for index introspection
and key enumeration.

Includes planner unit tests and field modifier ordering integration tests.
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 1, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb4fc9cb36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces the initial, schema-aware migration planning/validation foundation for RedisVL, including Pydantic models for migration artifacts, a planner that diffs schemas and produces plans (with warnings/rename ops), and shared utilities for schema comparison and key/index introspection.

Changes:

  • Added migration domain models (plans, reports, schema patches, rename operations, disk space estimates) and public exports under redisvl.migration.
  • Implemented a schema-aware MigrationPlanner (diffing, patch merge, rename-op extraction, key sampling, SVS-VAMANA requirement warnings).
  • Implemented MigrationValidator plus shared utilities for schema canonicalization/comparison and disk space estimation; expanded tests and Redis vector attr parsing.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
redisvl/migration/__init__.py Exposes migration planner/validator/models via package API.
redisvl/migration/models.py Adds Pydantic models for plans/reports/patches/validation and disk space estimates.
redisvl/migration/planner.py Adds schema-aware planner (snapshot, patch merge, diff classification, warnings, key sampling).
redisvl/migration/utils.py Adds YAML helpers, schema comparison/canonicalization, index readiness wait, disk estimation utilities.
redisvl/migration/validation.py Adds post-migration validator (schema/doc count/key sample/query checks).
redisvl/redis/connection.py Extends FT.INFO vector attribute parsing to include HNSW params (m, ef_construction).
tests/unit/test_migration_planner.py Adds unit tests for planner diff classification and patch/target schema handling.
tests/integration/test_field_modifier_ordering_integration.py Adds integration coverage for INDEXEMPTY/UNF/NOINDEX modifier behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17 nkanu17 requested a review from Copilot April 1, 2026 22:43
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb4fc9cb36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 1, 2026

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb4fc9cb36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nkanu17 added 3 commits April 1, 2026 19:38
- Fix BatchState.success_count checking 'succeeded' instead of 'success'
- Fix unbound 'ready' variable in wait_for_index_ready
- Fix classify_diff to resolve update_fields through rename mappings (both directions)
- Fix empty prefix falsy check (empty string '' is a valid prefix change)
- Eliminate redundant snapshot_source call in create_plan
- Prevent in-place mutation of SchemaPatch during prefix normalization
- Fix validation.py prefix separator handling and empty prefix support
- Allow negative indexing failure deltas (improvements should not error)
- Fix memory_savings_after_bytes to handle negative values (increases)
- Enhance get_vector_datatype_changes to support renamed vector fields
Test docstring claimed NOINDEX field can be sorted but not searched,
but only asserted sorting. Now also verifies filter returns 0 results.
…tting

- Fix key_sample transformation that produced 'prefix::suffix' instead of 'prefix:suffix'
- Apply code formatting to validation.py and planner.py
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3314ebf48c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Honor ValidationPolicy.require_schema_match and require_doc_count_match
- Fix functional check to verify index is queryable (total > 0) not exact count
- Use consistent 'is not None' checks in RenameOperations.has_operations
- Handle empty-prefix indexes in _sample_keys
- Normalize prefix separators in validation key rewriting
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f2d4fcf56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

try:
search_result = target_index.search(Query("*").paging(0, 1))
total_found = search_result.total
passed = total_found > 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow empty indexes to pass wildcard functional check

This check marks every migration of an empty index as failed: when expected_doc_count is 0, a healthy Query("*") returns total_found == 0, but passed = total_found > 0 sets the query check to failed and validate() then appends a hard error. That produces false-negative validation results for valid migrations where the source and target doc counts are both zero.

Useful? React with 👍 / 👎.

Comment on lines +474 to +476
if not has_prefix_change:
blocked_reasons.append(
"Changing index prefixes requires document migration (not yet supported)."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize equivalent single-prefix values before blocking

Prefix comparison here uses raw values from the patch, so semantically equivalent forms (e.g., source prefix 'docs' from FT.INFO vs target prefix ['docs'] in schema YAML) are treated as different. Because _extract_rename_operations() normalizes this to “no prefix change,” this branch still adds a blocked reason, causing no-op migrations to be incorrectly classified as unsupported.

Useful? React with 👍 / 👎.

keys_to_check = []
for k in key_sample:
if old_base and k.startswith(old_base):
keys_to_check.append(new_base + k[len(old_base) :])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validator prefix key transformation diverges from documented executor logic

Medium Severity

The comment states the validator must match the executor's RENAME logic exactly: new_key = new_prefix + key[len(old_prefix):]. However, the implementation applies rstrip(key_sep) to both prefixes before slicing, using old_base and new_base instead of old_prefix and new_prefix. When old and new prefixes have asymmetric trailing separators (e.g., old "docs:" → new "items"), the validator computes different transformed keys than the documented formula, causing false validation failures.

Fix in Cursor Fix in Web

if field_update.attrs:
merged_attrs = dict(existing_field.get("attrs", {}))
merged_attrs.update(field_update.attrs)
existing_field["attrs"] = merged_attrs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attrs merge preserves stale source attributes in target schema

Medium Severity

When normalize_target_schema_to_patch generates update_fields, it includes the complete target field definition. But merge_patch additively merges attrs (source attrs + update attrs) rather than replacing them. Source attrs absent from the target are incorrectly preserved in merged_target_schema, causing the merged schema to diverge from the intended target. For example, removing no_stem from a text field would silently fail.

Additional Locations (1)
Fix in Cursor Fix in Web

- Fix empty index validation: wildcard check passes when expected_doc_count is 0
- Normalize single-element list prefixes in classify_diff to avoid false blocked reasons
Copilot AI review requested due to automatic review settings April 2, 2026 14:49
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

source_value = source_dict["index"].get(index_key)
if source_value != target_value:
changes["index"][index_key] = target_value

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target schema diff misses source-only index settings

Medium Severity

normalize_target_schema_to_patch only iterates over keys present in target_dict["index"], so any index setting that exists in the source but is absent from the target (e.g., custom stopwords) is never captured as a change. Because merge_patch only adds or updates keys, those source-only settings silently survive into the merged schema, meaning the --target-schema path can produce a merged result that doesn't match the intended target.

Fix in Cursor Fix in Web

"""
start = time.perf_counter()
deadline = start + timeout_seconds
latest_info = index.info()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant remote index.info() call before loop

Low Severity

wait_for_index_ready calls latest_info = index.info() on line 260, but the very first line inside the while loop on line 265 immediately overwrites latest_info with another index.info() call. The initial call is a wasted Redis network roundtrip that is never consumed.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

new_base = new_prefix.rstrip(key_sep) if new_prefix else ""
keys_to_check = []
for k in key_sample:
if old_base and k.startswith(old_base):
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When change_prefix is set and the source prefix is empty (so old_base == ""), the if old_base and k.startswith(old_base) guard prevents rewriting keys. That makes the key-sample existence check diverge from the documented rename logic (new_key = new_prefix + key[len(old_prefix):]) and can incorrectly fail validation for empty-prefix indexes. Consider removing the old_base guard (empty string always matches) or handling the empty-prefix case explicitly so sampled keys are always transformed when a prefix change is planned.

Suggested change
if old_base and k.startswith(old_base):
if k.startswith(old_base):

Copilot uses AI. Check for mistakes.
Comment on lines +342 to +348
# Field renames from explicit rename_fields
rename_fields: List[FieldRename] = list(changes.rename_fields)
for field_rename in rename_fields:
warnings.append(
f"Field rename: '{field_rename.old_name}' -> '{field_rename.new_name}' "
"(requires read/write for all documents, may be slow for large datasets)"
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_rename_operations includes all entries from changes.rename_fields in both the returned RenameOperations and the warnings list, even when old_name == new_name (a no-op rename). Since merge_patch explicitly treats these as no-ops, it would be less confusing to filter them out here as well (skip adding the warning and omit them from rename_fields).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdb6941b80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +70 to +71
if old_base and k.startswith(old_base):
keys_to_check.append(new_base + k[len(old_base) :])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remap sampled keys when old prefix is empty

When change_prefix is set and the source prefix is "", this guard prevents remapping entirely, so validation checks the pre-migration keys instead of the renamed ones. That causes deterministic false failures (key_sample_exists=False) for successful prefix-change migrations from empty-prefix indexes, because the executor-side rename logic prepends the new prefix to every key.

Useful? React with 👍 / 👎.

Comment on lines +498 to +503
for field in changes.add_fields:
if field["type"] == "vector":
blocked_reasons.append(
f"Adding vector field '{field['name']}' requires document migration (not yet supported)."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block same-name remove/add field replacements

This loop only blocks newly added vector fields, so a patch can remove_fields: ["x"] and add_fields: [{"name": "x", ...}] to change a field’s type/path without going through update_fields. Because type/path incompatibility checks run only for update_fields, that unsupported mutation can be classified as supported in drop_recreate, leading to runtime indexing failures or incorrect results on preserved documents.

Useful? React with 👍 / 👎.

Move the sync MigrationExecutor into the core migration PR.
This version handles: field renames, key renames (prefix change),
drop/recreate index, wait for indexing, and validation.

Quantization reliability (checkpoints, BGSAVE, undo buffers) is
intentionally excluded and will be layered in PR #561.

Also includes:
- utils.py: logger, try/except in current_source_matches_snapshot,
  warnings in build_scan_match_patterns
- Integration tests (non-quantization): comprehensive + v1 flow
MigrationTimings,
MigrationValidation,
)
from redisvl.migration.planner import MigrationPlanner
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import of MigrationPlanner in executor module

Low Severity

MigrationPlanner is imported on line 16 but never referenced anywhere in executor.py. The executor only uses MigrationPlan (the model), not the planner class. This unused import adds an unnecessary coupling between the executor and planner modules.

Fix in Cursor Fix in Web

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3755b0542f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +256 to +260
results = pipe.execute()
for j, r in enumerate(results):
if r is True or r == 1:
renamed += 1
else:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preflight key collisions before issuing RENAMENX batch

The batch executes all RENAMENX commands before checking for collisions, so if any destination key already exists, some earlier keys in the same batch may already be renamed when the method raises. In apply(), this runs after the source index is dropped, which can leave production in a partially-renamed, no-index state that is hard to recover from. This should check destination-key collisions up front (or use an atomic server-side script) so a collision cannot produce partial key movement.

Useful? React with 👍 / 👎.

Comment on lines +67 to +71
new_base = new_prefix.rstrip(key_sep) if new_prefix else ""
keys_to_check = []
for k in key_sample:
if old_base and k.startswith(old_base):
keys_to_check.append(new_base + k[len(old_base) :])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align key-sample prefix rewrite with executor rename logic

validate() rewrites sampled keys using stripped prefixes, but _rename_keys() slices with the raw old_prefix and concatenates the raw new_prefix. When prefixes differ in trailing separator usage (for example old_prefix='doc:', new_prefix='article'), executor renames to article1 while validation checks article:1, causing false key_sample_exists failures even though migration operations succeeded as implemented. The validation transform should use the same exact formula as the executor.

Useful? React with 👍 / 👎.

…imate)

Adds the rvl migrate subcommand with core sync-only commands:
- helper: show migration guidance and supported capabilities
- list: list all available indexes
- plan: generate a migration plan
- apply: execute a migration plan (sync)
- estimate: estimate disk space for a migration
- validate: validate a completed migration

Also adds add_redis_connection_options to cli/utils.py and fixes
the SSL URL construction bug (was appending rediss:// after redis://).
Copilot AI review requested due to automatic review settings April 2, 2026 15:28
@nkanu17 nkanu17 changed the title feat(migrator): [1/7] Migration foundation with models, schema-aware planner, validation, and shared utilities feat(migrate): [1/7] Core planner, validator, sync executor, and basic CLI Apr 2, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 7 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

"dims": dims,
}

return changes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static method get_vector_datatype_changes is never called

Low Severity

MigrationPlanner.get_vector_datatype_changes is defined but never called anywhere in the codebase. A grep for its name returns only the definition itself. This appears to be dead code, possibly scaffolding for future quantization support (PR 2/7), but it's exported as part of the public class API without any consumers.

Fix in Cursor Fix in Web

exit(0)

def migrate(self):
Migrate()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing exit(0) in CLI migrate command handler

Low Severity

The migrate() method in RedisVlCLI does not call exit(0) after Migrate(), unlike index(), version(), and stats() which all explicitly call exit(0). This is inconsistent and means the process relies on natural termination rather than an explicit exit code, which could matter if the CLI is called from scripts checking exit codes.

Fix in Cursor Fix in Web

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for key in batch:
# Compute new key name
if key.startswith(old_prefix):
new_key = new_prefix + key[len(old_prefix) :]
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix rename computes new_key = new_prefix + key[len(old_prefix):], which breaks when old_prefix (or new_prefix) includes the configured key_separator (e.g. schema prefix "docs:" with key_separator=":" produces docs_v21 instead of docs_v2:1). This can rename keys incorrectly and cause data loss / validation failures. Normalize prefixes the same way as BaseStorage._key() (rstrip separator) and build new_key using the normalized bases (matching the validator’s transformation), or reuse the shared key-construction helper to avoid mismatches.

Suggested change
new_key = new_prefix + key[len(old_prefix) :]
suffix = key[len(old_prefix) :]
# Handle cases where old_prefix includes a trailing separator
# but new_prefix does not, to avoid producing keys like "docs_v21"
# instead of "docs_v2:1".
if (
suffix
and old_prefix
and suffix[0] == old_prefix[-1]
and not new_prefix.endswith(old_prefix[-1])
):
# Insert the separator between the new prefix and the rest
new_key = new_prefix + old_prefix[-1] + suffix[1:]
else:
new_key = new_prefix + suffix

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +193
normalized_prefixes = [
p.decode() if isinstance(p, bytes) else str(p) for p in prefixes
]
except Exception as e:
logger.warning(f"Failed to get prefix from index info: {e}")
normalized_prefixes = []

seen_keys: set[str] = set()
for match_pattern in build_scan_match_patterns(
normalized_prefixes, key_separator
):
cursor = 0
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SCAN fallback cannot determine prefixes from FT.INFO, normalized_prefixes becomes empty and build_scan_match_patterns() will return "*", causing a full keyspace scan. For migrations that rename fields/keys, this can unintentionally touch unrelated keys and be extremely expensive. Prefer using plan.source.keyspace.prefixes (already snapshotted) for SCAN matching, and if no safe prefix list is available, abort with an explicit error instead of scanning the entire DB.

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +324
logger.warning(
"Field '%s' already exists in key '%s'; "
"overwriting with value from '%s'",
new_name,
key,
old_name,
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field rename in hashes overwrites the destination field when new_name already exists (only logs a warning). In a migration tool, this can silently clobber user data and make the operation non-reversible. Consider failing fast on collisions by default (or using HSETNX/explicit overwrite=True flag in the plan) so users must opt in to destructive behavior.

Suggested change
logger.warning(
"Field '%s' already exists in key '%s'; "
"overwriting with value from '%s'",
new_name,
key,
old_name,
)
# Fail fast on collisions to avoid silently clobbering data
logger.error(
"Cannot rename field '%s' to '%s' in key '%s': "
"destination field already exists.",
old_name,
new_name,
key,
)
raise RuntimeError(
f"Field rename collision for key '{key}': "
f"destination field '{new_name}' already exists."
)

Copilot uses AI. Check for mistakes.
Comment on lines +657 to +669
# Document-dependent properties (not yet supported)
if source_attrs.get("dims") != target_attrs.get("dims"):
blocked_reasons.append(
f"Changing vector field '{field_name}' dims from {source_attrs.get('dims')} "
f"to {target_attrs.get('dims')} requires document migration (not yet supported). "
"Vectors are stored with incompatible dimensions."
)

# Datatype changes are now ALLOWED - executor will re-encode vectors
# before recreating the index

# All other vector changes are index-only and allowed
return blocked_reasons
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classifier marks vector datatype changes as supported and states the executor will re-encode vectors, but MigrationExecutor.apply() currently performs no datatype conversion for hash-backed documents. Recreating the index with a different vector datatype will typically cause reindexing failures due to wrong blob length/encoding. Either (a) keep datatype changes blocked until conversion is implemented, or (b) add an executor step that rewrites stored vector payloads for affected fields before recreating the index.

Copilot uses AI. Check for mistakes.

_notify("create", "Creating index with new schema...")
recreate_started = time.perf_counter()
target_index.create()
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_index.create() is called without overwrite=True, and SearchIndex.create() silently returns when the index already exists. If the target index name already exists (e.g., index rename collision), the migration could drop the source index and then no-op on create, leaving an unexpected target schema/data. Consider explicitly checking target_index.exists() and failing, or calling create(overwrite=True, drop=False) after confirming it’s safe to overwrite.

Suggested change
target_index.create()
# Ensure we do not silently no-op if the target index already exists.
if target_index.exists():
raise ResponseError(
f"Target index '{target_index.name}' already exists; "
"aborting migration to avoid overwriting an existing index."
)
target_index.create(overwrite=True, drop=False)

Copilot uses AI. Check for mistakes.
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

Closing in favor of restructured PR stack (Option A ordering).

@nkanu17 nkanu17 closed this Apr 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86896810ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +665 to +666
# Datatype changes are now ALLOWED - executor will re-encode vectors
# before recreating the index
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block datatype changes until vector re-encoding is implemented

The planner explicitly treats vector datatype changes as supported, but the executor path does not perform any vector payload rewrite before recreating the index. For hash-backed vectors this means stored blobs stay in the old byte width while the new schema expects a different one, which causes indexing failures after FT.CREATE/reindex. This should be blocked at planning time (or implemented in apply) to avoid producing “supported” plans that cannot succeed.

Useful? React with 👍 / 👎.

Comment on lines +548 to +552
source_index.delete(drop=False)
drop_duration = round(time.perf_counter() - drop_started, 3)
_notify("drop", f"done ({drop_duration}s)")

# STEP 4: Key renames (after drop, before recreate)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid dropping index before unsafe prefix renames

The migration drops the source index and only then performs key prefix renames. If _rename_keys hits a destination collision (RENAMENX returns 0) it raises, and the exception path does not restore the dropped index or roll back already-renamed keys, leaving a partially migrated keyspace without an index. Add a collision preflight (or move rename before drop) so failures do not strand the system mid-migration.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants