Skip to content

DRAFT - Refactor Collection Table Matcher#2499

Open
amorton wants to merge 49 commits into
mainfrom
ajm/refactor-collection-table-matcher
Open

DRAFT - Refactor Collection Table Matcher#2499
amorton wants to merge 49 commits into
mainfrom
ajm/refactor-collection-table-matcher

Conversation

@amorton

@amorton amorton commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What this PR does:

Refactor the ColectionTableMatcher into the SuperShreddingTablePredicate , and create a standard way to define what is in a SuperShredding table, via the SuperShreddingBuilder. We can then use this builder to create different components we need, such as predciate and cql statements to create the table.

This PR sets a foundation by putting the definition of super shredding in one place, and building a way to test / verify we are creating CQL we expect. With the idea that above that low level CQL testing verification we run tests against the tableMetadata etc.

This PR only updates the table matcher, next PR's will update how CreateCollectionOperation and related classes work to make a super shredding table.

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@amorton amorton requested a review from a team as a code owner June 11, 2026 02:10
@amorton amorton marked this pull request as draft June 11, 2026 02:11

public static String errFmt(DataType dataType) {
return nullSafe(dataType, d -> d.asCql(true, true));
return nullSafe(dataType, d -> d.asCql(false, true));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to not force frozen when it was not needed

/** Physical table column name that stores the lexical content. */
String LEXICAL_INDEX_COLUMN_NAME = "query_lexical_value";
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all names of columns to be in SuperShreddingMetadata

@github-actions

Copy link
Copy Markdown
Contributor

📈 Unit Test Coverage Delta vs Main Branch

Metric Value
Main Branch 52.50%
This PR 52.99%
Delta 🟢 +0.49%
✅ Coverage improved!

@github-actions

Copy link
Copy Markdown
Contributor

Unit Test Coverage Report

Overall Project 52.99% -0.4% 🍏
Files changed 83.11% 🍏

File Coverage
SuperShreddingMetadataBuilder.java 100% 🍏
ColumnMetadataPredicate.java 100% 🍏
CollectionIndexUsage.java 100% 🍏
ExtendedVectorType.java 100% 🍏
DocumentConstants.java 100% 🍏
SuperShreddingCQL.java 98.86% -1.14% 🍏
SuperShreddingCQLBuilder.java 95.22% -4.78% 🍏
AllCollectionFilter.java 91.8% 🍏
SuperShreddingMetadata.java 89.18% -10.82% 🍏
CqlIdentifierUtil.java 86.67% 🍏
SuperShreddingBinding.java 85.88% -14.12% 🍏
SuperShreddingBuilder.java 84.16% -15.84% 🍏
FindCollectionOperation.java 80% -0.4%
MapCollectionFilter.java 77.66% -3.9%
CQLSAIIndex.java 71.53% -5.11% 🍏
CreateCollectionOperation.java 68.76% 🍏
ErrorFormatters.java 65.36% 🍏
StringUtil.java 60% -5.71% 🍏
ApiIndexFunction.java 58.44% -23.38%
SuperShreddingTablePredicate.java 42.57% -57.43%
FindCollectionsCollectionOperation.java 22.46% 🍏
InCollectionFilter.java 13.47% -4.15%
CollectionSchemaObject.java 8.82% -1.05%
SchemaObjectFactory.java 6.25% -1.47%
SuperShreddingPredicateBuilder.java 0%
MatchCollectionFilter.java 0% -11.48%
MetadataDBTask.java 0% -2.2%

@github-actions

Copy link
Copy Markdown
Contributor

📉 Integration Test Coverage Delta vs Main Branch (dse69-it)

Metric Value
Main Branch 72.50%
This PR 71.40%
Delta 🔴 -1.10%
⚠️ Coverage decreased

@github-actions

Copy link
Copy Markdown
Contributor

Integration Test Coverage Report (dse69-it)

Overall Project 71.4% -1.6% 🍏
Files changed 32.99%

File Coverage
CollectionIndexUsage.java 100% 🍏
ExtendedVectorType.java 100% 🍏
FindCollectionsCollectionOperation.java 94.2% 🍏
AllCollectionFilter.java 91.8% 🍏
MetadataDBTask.java 90.11% 🍏
FindCollectionOperation.java 88.48% -0.13% 🍏
SchemaObjectFactory.java 87.5% 🍏
ErrorFormatters.java 86.79% 🍏
DocumentConstants.java 78.57% 🍏
CreateCollectionOperation.java 75.29% 🍏
CQLSAIIndex.java 73.72% -14.6%
MapCollectionFilter.java 68.79% 🍏
ApiIndexFunction.java 64.94% -35.06%
CollectionSchemaObject.java 63.38% 🍏
CqlIdentifierUtil.java 62.22% -8.89%
InCollectionFilter.java 59.33% 🍏
SuperShreddingMetadata.java 56.96% -43.04%
SuperShreddingTablePredicate.java 40.56% -59.44%
ColumnMetadataPredicate.java 36.57% -63.43%
StringUtil.java 25.71% -34.29%
SuperShreddingBinding.java 15.88% -84.12%
SuperShreddingCQLBuilder.java 0%
SuperShreddingBuilder.java 0%
SuperShreddingCQL.java 0%
SuperShreddingMetadataBuilder.java 0%
SuperShreddingPredicateBuilder.java 0%
MatchCollectionFilter.java 0% -11.48%

@amorton amorton marked this pull request as ready for review June 14, 2026 20:07
@github-actions

Copy link
Copy Markdown
Contributor

📉 Integration Test Coverage Delta vs Main Branch (hcd-it)

Metric Value
Main Branch 73.84%
This PR 72.72%
Delta 🔴 -1.12%
⚠️ Coverage decreased

@github-actions

Copy link
Copy Markdown
Contributor

Integration Test Coverage Report (hcd-it)

Overall Project 72.72% -1.59% 🍏
Files changed 33.31%

File Coverage
CollectionIndexUsage.java 100% 🍏
ExtendedVectorType.java 100% 🍏
FindCollectionOperation.java 94.3% 🍏
FindCollectionsCollectionOperation.java 94.2% 🍏
AllCollectionFilter.java 91.8% 🍏
MetadataDBTask.java 90.11% 🍏
SchemaObjectFactory.java 87.5% 🍏
ErrorFormatters.java 86.79% 🍏
DocumentConstants.java 78.57% 🍏
CreateCollectionOperation.java 78.33% 🍏
CQLSAIIndex.java 73.72% -14.6%
MapCollectionFilter.java 68.79% 🍏
CollectionSchemaObject.java 65.92% 🍏
ApiIndexFunction.java 64.94% -35.06%
CqlIdentifierUtil.java 62.22% -8.89%
InCollectionFilter.java 59.33% 🍏
SuperShreddingMetadata.java 56.96% -43.04%
SuperShreddingTablePredicate.java 40.56% -59.44%
MatchCollectionFilter.java 39.34% 🍏
ColumnMetadataPredicate.java 36.57% -63.43%
StringUtil.java 25.71% -34.29%
SuperShreddingBinding.java 15.88% -84.12%
SuperShreddingCQLBuilder.java 0%
SuperShreddingBuilder.java 0%
SuperShreddingCQL.java 0%
SuperShreddingMetadataBuilder.java 0%
SuperShreddingPredicateBuilder.java 0%

@clun clun left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some findings and hopefully helpful new classes as a review

this.textIndexTag |= other.textIndexTag;
this.timestampIndexTag |= other.timestampIndexTag;
this.nullIndexTag |= other.nullIndexTag;
this.vectorIndexTag |= other.vectorIndexTag;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing the lexicalIndexTag, adding a line 68

this.vectorIndexTag |= other.vectorIndexTag;
this.lexicalIndexTag |= other.lexicalIndexTag;

* <p>This pulls the options from the {@link SuperShreddingBinding} and puts them into maps of the
* values each index definition needs
*/
protected Stream<IndexDef> indexDefs(SuperShreddingBinding binding) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter binding is unused, i think there is no need to introduce the parameter except if you want to override the internal one which would be a mutation different for the rest of the class => delete.

protected final DataType type;

protected ColumnMetadataPredicate(CqlIdentifier name, DataType type) {
// no null checks in the ctor, so a subclass can fully override if they want to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok subclasses can fully override but they would also have to override the ToString or you raise a NPE.

i would not make name nor type Objects.requireNonNull() in the toString() or still add the test in the ctor

private final List<ColumnMetadataPredicate> strictMatch;

// A def that represents the rules used by the old `CollectionTableMatcher`
private static final SuperShreddingBinding BACKWARDS_COMPAT =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SuperShreddingBinding.builder().build()

It seems you are putting all default builder values, now i would understand you want to make them explicitely visible at this level

*
* <p>This class used to be called <code>CollectionTableMatcher</code>
*
* <p><b>NOTE:</b> As of June 2026, there is no check the indexes are valid, this will be future

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but let me see if i can create a baby index validator after the column valudation in SchemaObjectFactory

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predicate checks columns but not indexes. A collection with correct columns but missing/broken SAI indexes passes the predicate and is treated as healthy -- downstream queries then fail with confusing CQL errors.

I've prototyped a SuperShreddingIndexValidator on branch clun/review-2499-index-validator:

  • SuperShreddingIndexValidator -- validates SAI indexes after the predicate passes, returns a ValidationResult with missing/present/unexpected indexes
  • SuperShreddingIndexValidatorTest -- 13 tests covering healthy tables, missing required/optional indexes, unexpected indexes, and warn logging
  • TableMetadataTestUtil additions -- removeIndex, removeAllIndexes, clearAllIndexes, addIndex
  • SchemaObjectFactory sketch -- shows where to wire validate() after IS_COLLECTION_PREDICATE.test() (old code kept as comment)

Branch: clun/review-2499-index-validator (based on this PR's branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants