Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #312 +/- ##
============================================
- Coverage 81.43% 81.41% -0.02%
- Complexity 1549 1551 +2
============================================
Files 242 242
Lines 7514 7529 +15
Branches 726 729 +3
============================================
+ Hits 6119 6130 +11
- Misses 943 946 +3
- Partials 452 453 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
When sorting on an array field that is missing or empty, the Postgres backend now correctly returns 0. Uses jsonb_array_length for raw jsonb field access and ARRAY_LENGTH for native PG arrays from aggregations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous approach used PostgresDataAccessorIdentifierExpressionVisitor which casts jsonb to numeric via ->>. jsonb_array_length requires a raw jsonb value (via -> accessor), so switch to PostgresFieldIdentifierExpressionVisitor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the cross-datastore LENGTH function semantics so that taking the length of an absent/null list does not fail, and instead returns 0 (Mongo) / 0 via COALESCE (Postgres). This aligns LENGTH behavior across query generation paths and adds an integration test to validate sorting by computed list size when the field is missing.
Changes:
- Mongo: wrap
$sizeoperands with$ifNull: [<expr>, []]so missing/null arrays produce length0instead of an error. - Postgres: wrap length computation with
COALESCE(..., 0)and route JSONB arrays throughjsonb_array_length(...). - Tests: update expected Mongo pipelines and add an integration test for sorting by
LENGTH(tags)with a missing field.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| document-store/src/test/resources/mongo/pipeline/with_projections.json | Updates expected $size projection to default missing/null arrays to []. |
| document-store/src/test/resources/mongo/pipeline/simple.json | Updates expected $size projection to use $ifNull. |
| document-store/src/test/resources/mongo/pipeline/optimize_sorts_simple_sort_with_aggregation_selection.json | Updates expected $size projection to use $ifNull. |
| document-store/src/test/resources/mongo/pipeline/field_count.json | Updates expected $size projection to use $ifNull. |
| document-store/src/test/resources/mongo/pipeline/distinct_count.json | Updates expected $size projection to use $ifNull. |
| document-store/src/test/java/org/hypertrace/core/documentstore/postgres/query/v1/PostgresQueryParserTest.java | Adjusts expected SQL to COALESCE(ARRAY_LENGTH(...), 0) for LENGTH-of-aggregate cases. |
| document-store/src/main/java/org/hypertrace/core/documentstore/postgres/query/v1/vistors/PostgresFunctionExpressionVisitor.java | Changes Postgres LENGTH SQL generation to be null-safe and support JSONB arrays. |
| document-store/src/main/java/org/hypertrace/core/documentstore/mongo/query/parser/MongoFunctionExpressionParser.java | Makes Mongo LENGTH parsing null-safe by wrapping $size with $ifNull. |
| document-store/src/integrationTest/java/org/hypertrace/core/documentstore/DocStoreQueryV1Test.java | Adds integration test to ensure LENGTH on a missing array field returns 0 and sorts correctly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private String buildLengthExpression(final SelectTypeExpression operand) { | ||
| Optional<String> identifier = Optional.ofNullable(operand.accept(identifierExpressionVisitor)); | ||
| Optional<String> resolvedSelection = | ||
| identifier.map(v -> getPostgresQueryParser().getPgSelections().get(v)); | ||
| if (resolvedSelection.isPresent()) { | ||
| return String.format( | ||
| "COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", resolvedSelection.get(), ARRAY_DIMENSION); | ||
| } | ||
| PostgresFieldIdentifierExpressionVisitor jsonbVisitor = | ||
| new PostgresFieldIdentifierExpressionVisitor(getPostgresQueryParser()); | ||
| String parsedExpression = operand.accept(jsonbVisitor); | ||
| return String.format("COALESCE( jsonb_array_length( %s ), 0 )", parsedExpression); | ||
| } |
There was a problem hiding this comment.
Addressed in b7b7439. The code now checks getDocumentType() on the pgColTransformer:
- FLAT collections (native PG arrays like
TEXT[]): usesARRAY_LENGTH(column, 1) - NESTED collections (jsonb document fields): uses
jsonb_array_length(document->'field')
Both are wrapped with COALESCE(..., 0) so NULL/missing fields return 0.
| @ParameterizedTest | ||
| @ArgumentsSource(AllProvider.class) | ||
| public void testSortByListSizeWithMissingField(String dataStoreName) throws IOException { | ||
| Datastore datastore = datastoreMap.get(dataStoreName); | ||
| String collectionName = "list_size_sort_collection"; |
There was a problem hiding this comment.
The integration test runs against nested collections (the standard document store pattern) which exercises the jsonb_array_length path. The flat collection path (ARRAY_LENGTH) is covered by the existing unit tests in PostgresQueryParserTest which validate SQL generation for LENGTH on aggregation aliases. The flat collection LENGTH on a direct column field follows the same ARRAY_LENGTH codepath and is validated by the existing tests that assert correct SQL output.
For flat collections where fields are native PG arrays (e.g. TEXT[]), use ARRAY_LENGTH. For nested collections with jsonb document fields, use jsonb_array_length. Both wrapped with COALESCE to return 0 for NULL/missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| public void testSortByListSizeWithMissingField(String dataStoreName) throws IOException { | ||
| Datastore datastore = datastoreMap.get(dataStoreName); | ||
| String collectionName = "list_size_sort_collection"; | ||
| datastore.deleteCollection(collectionName); | ||
| datastore.createCollection(collectionName, null); | ||
| Collection collection = datastore.getCollection(collectionName); |
There was a problem hiding this comment.
Good catch. Wrapped the test body in try-finally in be9514a so deleteCollection runs regardless of assertion failures.
Ensures deleteCollection runs even if assertions throw. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| return String.format( | ||
| "COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", parsedExpression, ARRAY_DIMENSION); | ||
| } | ||
| return String.format("COALESCE( jsonb_array_length( %s ), 0 )", parsedExpression); |
There was a problem hiding this comment.
Fixed in bc24f4c. Replaced COALESCE(jsonb_array_length(...), 0) with the CASE WHEN jsonb_typeof(...) = 'array' THEN ... ELSE '[]'::jsonb END pattern already used elsewhere in the codebase (e.g. PostgresFromTypeExpressionVisitor, PostgresFilterTypeExpressionVisitor). This handles SQL NULL (missing key), JSON null, and non-array types — all return 0.
COALESCE only handles SQL NULL (missing key). When a field has an explicit JSON null value, jsonb_array_length errors. Use the same CASE WHEN jsonb_typeof = 'array' pattern used elsewhere in the codebase to safely return 0 for missing, null, and non-array fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
document-store/src/main/java/org/hypertrace/core/documentstore/mongo/query/parser/MongoFunctionExpressionParser.java:83
FunctionOperator.LENGTHis only valid with a single operand, but fornumArgs > 1this parser currently emits{ "$size": [ ... ] }, which MongoDB will reject at runtime. Since this PR is tightening LENGTH semantics, consider explicitly validating arity for LENGTH (and throwing anIllegalArgumentException) so callers get a clear error instead of a server-side pipeline failure.
if (numArgs == 1) {
Object value = expression.getOperands().get(0).accept(parser);
// $size fails when the operand resolves to a missing/absent (or null) field. Default such a
// value to an empty array so that LENGTH of an absent field is 0 instead of throwing an
// error.
if (operator == LENGTH) {
value = Map.of("$ifNull", List.of(value, List.of()));
}
return Map.of(key, value);
}
List<Object> values =
expression.getOperands().stream().map(op -> op.accept(parser)).collect(Collectors.toList());
return Map.of(key, values);
| Optional<String> identifier = Optional.ofNullable(operand.accept(identifierExpressionVisitor)); | ||
| Optional<String> resolvedSelection = | ||
| identifier.map(v -> getPostgresQueryParser().getPgSelections().get(v)); | ||
| if (resolvedSelection.isPresent()) { | ||
| return String.format( | ||
| "COALESCE( ARRAY_LENGTH( %s, %s ), 0 )", resolvedSelection.get(), ARRAY_DIMENSION); | ||
| } |
There was a problem hiding this comment.
In practice, the pgSelections path is only reached when LENGTH operates on an alias defined in a prior selection — and those aliases always come from aggregation expressions (ARRAY_AGG, DISTINCT_ARRAY) that produce native PG arrays. There's no existing usage pattern where a raw jsonb field is aliased and then LENGTH is applied to that alias. The direct field access case (without an alias) is handled by the branch below which uses jsonb_typeof for nested collections.
No description provided.