Skip to content

feat: expose extended vector indexing options on createVectorIndex#2505

Draft
erichare wants to merge 2 commits into
mainfrom
feat/vector-index-options-2487
Draft

feat: expose extended vector indexing options on createVectorIndex#2505
erichare wants to merge 2 commits into
mainfrom
feat/vector-index-options-2487

Conversation

@erichare

Copy link
Copy Markdown
Contributor

What this PR does:

Exposes additional Cassandra SAI vector indexing configuration options on the createVectorIndex
command (tables). Until now only 2 of the 7 SAI vector index-creation options were configurable
(metricsimilarity_function, sourceModelsource_model). This adds a new
indexingOptions field under definition.options that accepts either:

  • a String naming a predefined profile — expanded by an in-code VectorIndexProfiles
    registry into a set of SAI options (e.g. "small-high-recall"), or
  • an Object of raw Cassandra SAI indexing options — passed through verbatim using
    Cassandra's snake_case names (forward-compatible: new SAI options need no code change).

Anything else returns an error. This follows the approach decided on the issue (combination of
named-profiles and raw options). The existing metric / sourceModel fields are unchanged for
backwards compatibility and remain the dedicated way to set similarity_function / source_model;
those two keys are rejected if also supplied inside the raw options object.

Example requests:

{ "createVectorIndex": { "name": "idx", "definition": { "column": "v",
  "options": { "metric": "cosine", "sourceModel": "openai-v3-small",
               "indexingOptions": "small-high-recall" } } } }
{ "createVectorIndex": { "name": "idx", "definition": { "column": "v",
  "options": { "metric": "cosine", "sourceModel": "openai-v3-small",
               "indexingOptions": { "enable_hierarchy": true, "maximum_node_connections": 32 } } } } }

Implementation notes:

  • Mirrors the existing ApiTextIndex.analyzer JsonNode pattern (no custom deserializer).
  • New SchemaException codes UNKNOWN_VECTOR_INDEXING_PROFILE and
    INVALID_VECTOR_INDEXING_OPTIONS with errors.yaml templates.
  • listIndexes renders the resolved options back under indexingOptions, excluding the structural
    (class_name / target) and dedicated-field (source_model / similarity_function) keys.
  • Scope is tables-first; applying the same field to createCollection is a follow-up.

⚠️ Backend support / "pass-through" design: the new tuning options require the target cluster to
allow custom SAI HNSW parameters (HCD/cndb). On a backend that disallows them (e.g. DSE 6.9, which
guards them behind SAI_HNSW_ALLOW_CUSTOM_PARAMETERS), the API faithfully forwards the options and
surfaces the database's INVALID_DATABASE_QUERY error — this is intentional ("let the DB reject").
The Data API does not pre-validate option support by environment.

Which issue(s) this PR fixes:
Fixes #2487

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

Testing

  • Unit (deterministic, no DB): ApiVectorIndexTest covers applyIndexingOptions (profile
    expansion, raw pass-through incl. number/bool → String, reserved-key rejection, unknown-profile,
    non-String/Object rejection, null/empty no-ops) and renderIndexingOptions (read-back filtering);
    VectorIndexProfilesTest covers the registry.
  • Integration (CreateTableIndexIntegrationTest): API-validation cases (unknown profile, reserved
    key, wrong type) — backend-agnostic, pass on DSE and HCD.
  • DB-acceptance happy-path/round-trip ITs are intentionally omitted because they require a
    custom-params-enabled cluster (see the backend note above); re-adding them gated to an HCD lane is
    a follow-up.
  • Verified locally: full unit suite + the two index IT classes pass (./mvnw verify, DSE 6.9.21).

Follow-ups

  • Apply the same indexingOptions field to createCollection.
  • Confirm/tune the concrete profile → option mappings and externalize profiles to configuration.

…2487)

Add an `indexingOptions` field to the createVectorIndex command's
`definition.options`. It accepts either:

- a String naming a predefined profile (expanded by the in-code
  VectorIndexProfiles registry into a set of SAI options), or
- an Object of raw Cassandra SAI indexing options, passed through
  verbatim using Cassandra's snake_case names (forward-compatible).

Anything else is rejected. The existing `metric` / `sourceModel` fields
are unchanged and remain the dedicated way to set similarity_function /
source_model; those keys are rejected inside the raw options object.

Implemented by mirroring the existing ApiTextIndex.analyzer JsonNode
pattern. Adds two SchemaException codes
(UNKNOWN_VECTOR_INDEXING_PROFILE, INVALID_VECTOR_INDEXING_OPTIONS) with
errors.yaml templates. listIndexes renders the resolved options back
under indexingOptions (excluding the structural and dedicated-field
keys).

Note: the new tuning options require the target backend to allow custom
SAI HNSW parameters; per the "pass-through" design, the API forwards the
options and surfaces the database error on backends that disallow them.
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

📈 Unit Test Coverage Delta vs Main Branch

Metric Value
Main Branch 52.50%
This PR 52.59%
Delta 🟢 +0.08%
✅ Coverage improved!

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Unit Test Coverage Report

Overall Project 52.59% -0.02% 🍏
Files changed 90.67% 🍏

File Coverage
VectorIndexProfiles.java 100% 🍏
VectorConstants.java 100% 🍏
SchemaException.java 100% 🍏
ApiVectorIndex.java 44.21% -3.48% 🍏
VectorIndexDefinitionDesc.java 0% 🍏

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

📈 Integration Test Coverage Delta vs Main Branch (dse69-it)

Metric Value
Main Branch 72.50%
This PR 72.52%
Delta 🟢 +0.01%
✅ Coverage improved!

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Integration Test Coverage Report (dse69-it)

Overall Project 72.52% -0.04% 🍏
Files changed 80% 🍏

File Coverage
VectorIndexDefinitionDesc.java 100% 🍏
VectorConstants.java 100% 🍏
SchemaException.java 100% 🍏
VectorIndexProfiles.java 93.33% -6.67% 🍏
ApiVectorIndex.java 75.33% -7.12% 🍏

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

📈 Integration Test Coverage Delta vs Main Branch (hcd-it)

Metric Value
Main Branch 73.84%
This PR 73.86%
Delta 🟢 +0.01%
✅ Coverage improved!

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Integration Test Coverage Report (hcd-it)

Overall Project 73.86% -0.04% 🍏
Files changed 80% 🍏

File Coverage
VectorIndexDefinitionDesc.java 100% 🍏
VectorConstants.java 100% 🍏
SchemaException.java 100% 🍏
VectorIndexProfiles.java 93.33% -6.67% 🍏
ApiVectorIndex.java 75.33% -7.12% 🍏

…ew cleanups

Address review feedback on #2487:

- Reject raw indexingOptions keys class_name/target (set automatically by
  the API) with INVALID_VECTOR_INDEXING_OPTIONS, symmetric with how
  renderIndexingOptions filters them on read. Adds unit + IT coverage.
- @Schema description for indexingOptions: use concatenated string literals
  (matching metric/sourceModel) and drop type=OBJECT to match the analyzer
  precedent for String-or-Object fields.
- Mark applyIndexingOptions/renderIndexingOptions @VisibleForTesting.
- Drop @DisplayName from the new unit tests to match repo convention.
- Remove unused CQLAnnIndex constants (neighborhood_overflow, alpha,
  enable_hierarchy); keep the two used by profiles.
- Use bare assertTableCommand in the new IT cases.
- errors.yaml: revert unrelated whitespace churn on UNKNOWN_VECTOR_METRIC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new Vector indexing options for CreateCollection (beyond existing "source_model", "metric")

1 participant