feat: expose extended vector indexing options on createVectorIndex#2505
Draft
erichare wants to merge 2 commits into
Draft
feat: expose extended vector indexing options on createVectorIndex#2505erichare wants to merge 2 commits into
erichare wants to merge 2 commits into
Conversation
…2487) Add an `indexingOptions` field to the createVectorIndex command's `definition.options`. It accepts either: - a String naming a predefined profile (expanded by the in-code VectorIndexProfiles registry into a set of SAI options), or - an Object of raw Cassandra SAI indexing options, passed through verbatim using Cassandra's snake_case names (forward-compatible). Anything else is rejected. The existing `metric` / `sourceModel` fields are unchanged and remain the dedicated way to set similarity_function / source_model; those keys are rejected inside the raw options object. Implemented by mirroring the existing ApiTextIndex.analyzer JsonNode pattern. Adds two SchemaException codes (UNKNOWN_VECTOR_INDEXING_PROFILE, INVALID_VECTOR_INDEXING_OPTIONS) with errors.yaml templates. listIndexes renders the resolved options back under indexingOptions (excluding the structural and dedicated-field keys). Note: the new tuning options require the target backend to allow custom SAI HNSW parameters; per the "pass-through" design, the API forwards the options and surfaces the database error on backends that disallow them.
Contributor
📈 Unit Test Coverage Delta vs Main Branch
|
Contributor
Unit Test Coverage Report
|
Contributor
📈 Integration Test Coverage Delta vs Main Branch (dse69-it)
|
Contributor
Integration Test Coverage Report (dse69-it)
|
Contributor
📈 Integration Test Coverage Delta vs Main Branch (hcd-it)
|
Contributor
Integration Test Coverage Report (hcd-it)
|
…ew cleanups Address review feedback on #2487: - Reject raw indexingOptions keys class_name/target (set automatically by the API) with INVALID_VECTOR_INDEXING_OPTIONS, symmetric with how renderIndexingOptions filters them on read. Adds unit + IT coverage. - @Schema description for indexingOptions: use concatenated string literals (matching metric/sourceModel) and drop type=OBJECT to match the analyzer precedent for String-or-Object fields. - Mark applyIndexingOptions/renderIndexingOptions @VisibleForTesting. - Drop @DisplayName from the new unit tests to match repo convention. - Remove unused CQLAnnIndex constants (neighborhood_overflow, alpha, enable_hierarchy); keep the two used by profiles. - Use bare assertTableCommand in the new IT cases. - errors.yaml: revert unrelated whitespace churn on UNKNOWN_VECTOR_METRIC.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
Exposes additional Cassandra SAI vector indexing configuration options on the
createVectorIndexcommand (tables). Until now only 2 of the 7 SAI vector index-creation options were configurable
(
metric→similarity_function,sourceModel→source_model). This adds a newindexingOptionsfield underdefinition.optionsthat accepts either:VectorIndexProfilesregistry into a set of SAI options (e.g.
"small-high-recall"), orCassandra's snake_case names (forward-compatible: new SAI options need no code change).
Anything else returns an error. This follows the approach decided on the issue (combination of
named-profiles and raw options). The existing
metric/sourceModelfields are unchanged forbackwards compatibility and remain the dedicated way to set
similarity_function/source_model;those two keys are rejected if also supplied inside the raw options object.
Example requests:
{ "createVectorIndex": { "name": "idx", "definition": { "column": "v", "options": { "metric": "cosine", "sourceModel": "openai-v3-small", "indexingOptions": "small-high-recall" } } } }{ "createVectorIndex": { "name": "idx", "definition": { "column": "v", "options": { "metric": "cosine", "sourceModel": "openai-v3-small", "indexingOptions": { "enable_hierarchy": true, "maximum_node_connections": 32 } } } } }Implementation notes:
ApiTextIndex.analyzerJsonNodepattern (no custom deserializer).SchemaExceptioncodesUNKNOWN_VECTOR_INDEXING_PROFILEandINVALID_VECTOR_INDEXING_OPTIONSwitherrors.yamltemplates.listIndexesrenders the resolved options back underindexingOptions, excluding the structural(
class_name/target) and dedicated-field (source_model/similarity_function) keys.createCollectionis a follow-up.allow custom SAI HNSW parameters (HCD/cndb). On a backend that disallows them (e.g. DSE 6.9, which
guards them behind
SAI_HNSW_ALLOW_CUSTOM_PARAMETERS), the API faithfully forwards the options andsurfaces the database's
INVALID_DATABASE_QUERYerror — this is intentional ("let the DB reject").The Data API does not pre-validate option support by environment.
Which issue(s) this PR fixes:
Fixes #2487
Checklist
Testing
ApiVectorIndexTestcoversapplyIndexingOptions(profileexpansion, raw pass-through incl. number/bool → String, reserved-key rejection, unknown-profile,
non-String/Object rejection, null/empty no-ops) and
renderIndexingOptions(read-back filtering);VectorIndexProfilesTestcovers the registry.CreateTableIndexIntegrationTest): API-validation cases (unknown profile, reservedkey, wrong type) — backend-agnostic, pass on DSE and HCD.
custom-params-enabled cluster (see the backend note above); re-adding them gated to an HCD lane is
a follow-up.
./mvnw verify, DSE 6.9.21).Follow-ups
indexingOptionsfield tocreateCollection.