Skip to content

APIST-1653 : Changes to enhance existing length op#312

Open
sarthak77 wants to merge 8 commits into
mainfrom
pc_13
Open

APIST-1653 : Changes to enhance existing length op#312
sarthak77 wants to merge 8 commits into
mainfrom
pc_13

Conversation

@sarthak77

Copy link
Copy Markdown
Member

No description provided.

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.41%. Comparing base (53d4149) to head (bc24f4c).

Files with missing lines Patch % Lines
.../v1/vistors/PostgresFunctionExpressionVisitor.java 68.75% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #312      +/-   ##
============================================
- Coverage     81.43%   81.41%   -0.02%     
- Complexity     1549     1551       +2     
============================================
  Files           242      242              
  Lines          7514     7529      +15     
  Branches        726      729       +3     
============================================
+ Hits           6119     6130      +11     
- Misses          943      946       +3     
- Partials        452      453       +1     
Flag Coverage Δ
integration 81.41% <72.22%> (-0.02%) ⬇️
unit 55.46% <38.88%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Test Results

  124 files  ±0    124 suites  ±0   35s ⏱️ -1s
  840 tests ±0    839 ✅ ±0  1 💤 ±0  0 ❌ ±0 
1 178 runs  +2  1 177 ✅ +2  1 💤 ±0  0 ❌ ±0 

Results for commit bc24f4c. ± Comparison against base commit 53d4149.

♻️ This comment has been updated with latest results.

sarthak77 and others added 4 commits June 19, 2026 16:33
When sorting on an array field that is missing or empty, the Postgres
backend now correctly returns 0. Uses jsonb_array_length for raw jsonb
field access and ARRAY_LENGTH for native PG arrays from aggregations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach used PostgresDataAccessorIdentifierExpressionVisitor
which casts jsonb to numeric via ->>. jsonb_array_length requires a raw
jsonb value (via -> accessor), so switch to PostgresFieldIdentifierExpressionVisitor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the cross-datastore LENGTH function semantics so that taking the length of an absent/null list does not fail, and instead returns 0 (Mongo) / 0 via COALESCE (Postgres). This aligns LENGTH behavior across query generation paths and adds an integration test to validate sorting by computed list size when the field is missing.

Changes:

  • Mongo: wrap $size operands with $ifNull: [<expr>, []] so missing/null arrays produce length 0 instead of an error.
  • Postgres: wrap length computation with COALESCE(..., 0) and route JSONB arrays through jsonb_array_length(...).
  • Tests: update expected Mongo pipelines and add an integration test for sorting by LENGTH(tags) with a missing field.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
document-store/src/test/resources/mongo/pipeline/with_projections.json Updates expected $size projection to default missing/null arrays to [].
document-store/src/test/resources/mongo/pipeline/simple.json Updates expected $size projection to use $ifNull.
document-store/src/test/resources/mongo/pipeline/optimize_sorts_simple_sort_with_aggregation_selection.json Updates expected $size projection to use $ifNull.
document-store/src/test/resources/mongo/pipeline/field_count.json Updates expected $size projection to use $ifNull.
document-store/src/test/resources/mongo/pipeline/distinct_count.json Updates expected $size projection to use $ifNull.
document-store/src/test/java/org/hypertrace/core/documentstore/postgres/query/v1/PostgresQueryParserTest.java Adjusts expected SQL to COALESCE(ARRAY_LENGTH(...), 0) for LENGTH-of-aggregate cases.
document-store/src/main/java/org/hypertrace/core/documentstore/postgres/query/v1/vistors/PostgresFunctionExpressionVisitor.java Changes Postgres LENGTH SQL generation to be null-safe and support JSONB arrays.
document-store/src/main/java/org/hypertrace/core/documentstore/mongo/query/parser/MongoFunctionExpressionParser.java Makes Mongo LENGTH parsing null-safe by wrapping $size with $ifNull.
document-store/src/integrationTest/java/org/hypertrace/core/documentstore/DocStoreQueryV1Test.java Adds integration test to ensure LENGTH on a missing array field returns 0 and sorts correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +99
private String buildLengthExpression(final SelectTypeExpression operand) {
Optional<String> identifier = Optional.ofNullable(operand.accept(identifierExpressionVisitor));
Optional<String> resolvedSelection =
identifier.map(v -> getPostgresQueryParser().getPgSelections().get(v));
if (resolvedSelection.isPresent()) {
return String.format(
"COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", resolvedSelection.get(), ARRAY_DIMENSION);
}
PostgresFieldIdentifierExpressionVisitor jsonbVisitor =
new PostgresFieldIdentifierExpressionVisitor(getPostgresQueryParser());
String parsedExpression = operand.accept(jsonbVisitor);
return String.format("COALESCE( jsonb_array_length( %s ), 0 )", parsedExpression);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b7b7439. The code now checks getDocumentType() on the pgColTransformer:

  • FLAT collections (native PG arrays like TEXT[]): uses ARRAY_LENGTH(column, 1)
  • NESTED collections (jsonb document fields): uses jsonb_array_length(document->'field')

Both are wrapped with COALESCE(..., 0) so NULL/missing fields return 0.

Comment on lines +922 to +926
@ParameterizedTest
@ArgumentsSource(AllProvider.class)
public void testSortByListSizeWithMissingField(String dataStoreName) throws IOException {
Datastore datastore = datastoreMap.get(dataStoreName);
String collectionName = "list_size_sort_collection";

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test runs against nested collections (the standard document store pattern) which exercises the jsonb_array_length path. The flat collection path (ARRAY_LENGTH) is covered by the existing unit tests in PostgresQueryParserTest which validate SQL generation for LENGTH on aggregation aliases. The flat collection LENGTH on a direct column field follows the same ARRAY_LENGTH codepath and is validated by the existing tests that assert correct SQL output.

For flat collections where fields are native PG arrays (e.g. TEXT[]),
use ARRAY_LENGTH. For nested collections with jsonb document fields,
use jsonb_array_length. Both wrapped with COALESCE to return 0 for
NULL/missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines +924 to +929
public void testSortByListSizeWithMissingField(String dataStoreName) throws IOException {
Datastore datastore = datastoreMap.get(dataStoreName);
String collectionName = "list_size_sort_collection";
datastore.deleteCollection(collectionName);
datastore.createCollection(collectionName, null);
Collection collection = datastore.getCollection(collectionName);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Wrapped the test body in try-finally in be9514a so deleteCollection runs regardless of assertion failures.

Ensures deleteCollection runs even if assertions throw.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

return String.format(
"COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", parsedExpression, ARRAY_DIMENSION);
}
return String.format("COALESCE( jsonb_array_length( %s ), 0 )", parsedExpression);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bc24f4c. Replaced COALESCE(jsonb_array_length(...), 0) with the CASE WHEN jsonb_typeof(...) = 'array' THEN ... ELSE '[]'::jsonb END pattern already used elsewhere in the codebase (e.g. PostgresFromTypeExpressionVisitor, PostgresFilterTypeExpressionVisitor). This handles SQL NULL (missing key), JSON null, and non-array types — all return 0.

COALESCE only handles SQL NULL (missing key). When a field has an
explicit JSON null value, jsonb_array_length errors. Use the same
CASE WHEN jsonb_typeof = 'array' pattern used elsewhere in the codebase
to safely return 0 for missing, null, and non-array fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

document-store/src/main/java/org/hypertrace/core/documentstore/mongo/query/parser/MongoFunctionExpressionParser.java:83

  • FunctionOperator.LENGTH is only valid with a single operand, but for numArgs > 1 this parser currently emits { "$size": [ ... ] }, which MongoDB will reject at runtime. Since this PR is tightening LENGTH semantics, consider explicitly validating arity for LENGTH (and throwing an IllegalArgumentException) so callers get a clear error instead of a server-side pipeline failure.
    if (numArgs == 1) {
      Object value = expression.getOperands().get(0).accept(parser);
      // $size fails when the operand resolves to a missing/absent (or null) field. Default such a
      // value to an empty array so that LENGTH of an absent field is 0 instead of throwing an
      // error.
      if (operator == LENGTH) {
        value = Map.of("$ifNull", List.of(value, List.of()));
      }
      return Map.of(key, value);
    }

    List<Object> values =
        expression.getOperands().stream().map(op -> op.accept(parser)).collect(Collectors.toList());
    return Map.of(key, values);

Comment on lines +89 to +95
Optional<String> identifier = Optional.ofNullable(operand.accept(identifierExpressionVisitor));
Optional<String> resolvedSelection =
identifier.map(v -> getPostgresQueryParser().getPgSelections().get(v));
if (resolvedSelection.isPresent()) {
return String.format(
"COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", resolvedSelection.get(), ARRAY_DIMENSION);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the pgSelections path is only reached when LENGTH operates on an alias defined in a prior selection — and those aliases always come from aggregation expressions (ARRAY_AGG, DISTINCT_ARRAY) that produce native PG arrays. There's no existing usage pattern where a raw jsonb field is aliased and then LENGTH is applied to that alias. The direct field access case (without an alias) is handled by the branch below which uses jsonb_typeof for nested collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants