[TOC]
This project provides a usability-first Java DSL library allowing to build elastic queries serializable to the Elastic JSON DSL using Jackson. It has user usability and concision in mind. As a DSL it provides type safety and avoids mistakes, but it also has been designed to be more straightforward in expressing standard situations.
It's a (quasi) self-contained library with no direct dependency to either Elastic or OpenSearch clients or even an HTTP client (the choice is yours) and therefore can be integrated seamlessly into your project with a tiny overhead.
It's only non-testing dependency ia jackson to be able to help serialization and deserialization. Usage of Jackson should be reduced enough to very stable aspects of it to allow overriding the version without hassle. All other dependencies (junit and assertJ) are only for testing.
NOTE: The DSL also works for both Elastic and OpenSearch as the APIs follow each other closely
This DSL is more straight forward than the standard OpenSearch DSL abusing lambdas. It is also somewhat easier than Elastic's DSL which strictly follows the API structure (and therefore inherits some of its complexities)
For example, instead of having to (cumbersomely) write:
{
"query": {
"bool": {
"should": [
{
"range": {
"birthdate": {
"gte": "1990-01-01",
"lte": "2000-01-01"
}
}
}
],
"must": [
{
"term": {
"firstname": "benjamin"
}
}
],
"filter": [
{
"term": {
"city": "biel"
}
}
]
}
}
}
The Java DSL allows to express this as:
var query = query(
newBool()
.must(term("firstname", "benjamin"))
.should(range("birthdate", LocalDate.parse("1990-01-01"), LocalDate.parse("2000-01-01")))
.filter(term("city", "biel"))
.build()
);
<dependency>
<groupId>tech.habegger.elastic</groupId>
<artifactId>elastic-dsl</artifactId>
<version>1.0.0</version>
</dependency>
Import the constructs you need (or let your IDE do it for you ;)):
import static tech.habegger.elastic.search.ElasticBooleanClause.newBool;
import static tech.habegger.elastic.search.ElasticSearchRequest.query;
import static tech.habegger.elastic.search.ElasticTermClause.term;
And just use the DSL:
var mapper = new ObjectMapper();
var elasticQuery = query(
newBool()
.must(term("lastname", "habegger"))
.should(term("firstname", "benjamin"))
.build()
);
var queryAsString = mapper.writeValue(elasticQuery);
For a complete example, checkout SampleIndexAndSearch Which demonstrates how to use the DSL with Java's embedded HTTP client on an index named
playground. The example:
- Creates a settings item using the DSL
- Deletes the playground index
- Creates the playground index using the serialization of the DSL settings
- Pushes an example document (using plain old java record)
- Creates a query using the DSL
- Searches the index using the serialized query.
Most constructs made available through the DSL should have a unit tests. Please have a look in the test suite for example syntax.
The DSL has been designed with an effort to find a good compromise between completeness (being able to express any Elastic query or aggregation) and conciseness (being able to do so easily). In order to do this, the following principles have been tried to be followed.
- Mandatory (or very frequently used) parameters are included in the main builder method (e.g.
termsmust have a field name and values so those are passed as direct arguments of thetermsmethod). - Optional less frequent parameters changing the behavior use modifying methods (e.g. the
boxPlotaggregation takes the field as single argument and has a modifier methodwithCompressionto set thecompressionwhen needed). - Only really complex situations use a more advanced "Builder" pattern requiring a final
build()method call to return the serializable version of the Elastic expression. In this case, the initial building method is prefixed with new. (e.g.newBool()starts a bool expression builder). - In some cases, the initial newXX Builder will take mandatory parameters (e.g.
newPinnedmethod takes an Elastic clause as argument to define the query of the "organic" documents and differs the "pinning" to later calls)
- Removes most of the JSON-related boilerplate
- Avoids typos and structural mistakes when writing the queries
- Usability driven
- More straightforward than the API structure (and the official DSLs which strictly follow this structure)
This is an initial version of the DSL, therefore all query shapes are not yet supported. However, there is a support for custom clauses to compensate a bit for the places where support is not there yet. But do feel free to propose a merge request to get the unsupported clauses ;)
Set test class ElasticSearchCompoundQueryTest.java
| Query Type | Supported | Tests |
|---|---|---|
| Boolean | β | bool* |
| Boosting | β | boostingQuery |
| Constant score | β | constantScoreQuery |
| Disjunction max | β | disjunctionMaxQuery |
| Function score | β | functionScoreQuery |
See test class ElasticSearchFullTextQueryTest
| Query Type | Supported | Test method(s) |
|---|---|---|
| Intervals | π² | |
| Match | β | matchQuery |
| Match boolean prefix | β | matchPhrasePrefixQuery |
| Match phrase | β | matchBoolPrefixQuery |
| Match phrase prefix | β | matchPhraseQuery |
| Combined fields | β | combinedFieldsQuery |
| Multi-match | β | multiMatchQuery |
| Query string | π² | |
| Simple query string | π² |
See test class ElasticSearchGeoQueryTest#geoHashQuery
| Query Type | Supported | Tests |
|---|---|---|
| Geo-bounding box | β | geoBoundingBoxQuery |
| Geo-distance | β | geoDistanceQuery |
| Geo-grid | β | geoHashQuery |
| Geo-polygon | β | geoPolygonQuery |
| Geoshape | β | geoShapeInlineQuery, geoShapeIndexedQuery |
See test class ElasticSearchJoinQueryTest
| Query Type | Supported | Tests |
|---|---|---|
| Nested | β | nestedQuery |
| Has child | π² | |
| Has parent | π² | |
| Parent ID | π² |
| Query Type | Supported |
|---|---|
| Span queries | π² |
| Span containing | π² |
| Span field masking | π² |
| Span first | π² |
| Span multi-term | π² |
| Span near | π² |
| Span not | π² |
| Span or | π² |
| Span term | π² |
| Span within | π² |
See test class ElasticSearchSpecializedQueryTest
| Query Type | Supported | Tests | Notes |
|---|---|---|---|
| Distance feature | β | distanceFeatureTemporalQuery, distanceFeatureDistanceQuery | |
| More like this | β | moreLikeThisQuery, moreLikeThisQueryWithInlineDoc | |
| Percolate | β | percolateQuery | |
| Knn | β | knnQuery | |
| Rank feature | β | rankFeatureQuery | Missing function object parameters |
| Script | π² | ||
| Script score | β | scriptScoreQuery | |
| Wrapper | β | wrapperQuery | |
| Pinned Query | β | pinnedQuery | |
| Rule | π² |
See test class ElasticSearchTermLevelQueryTest
| Query Type | Supported | Test method(s) |
|---|---|---|
| Exists | β | existsQuery |
| Fuzzy | β | fuzzySimple, fuzzyComplex |
| IDs | β | idsQuery |
| Prefix | β | prefixQuery |
| Range | β | rangeBoth, rangeQueryGteOnly, rangeQueryLteOnly |
| Regexp | β | regexpQuerySimple, regexpQueryMultipleFlags |
| Term | β | termQuery |
| Terms | β | termsQuery |
| Terms set | β | termsSetQueryWithScript |
| Wildcard | β | wildcardQuery |
See test class ElasticSearchOtherQueryTest
| Query Type | Supported | Tests |
|---|---|---|
| Shape | π² | |
| Match All | β | matchAllQuery |
| Text expansion query | π² |
See test class ElasticBucketAggregationsTest
| Aggregation Type | Supported | Tests | Notes |
|---|---|---|---|
| Adjacency matrix | β | adjacencyMatrixAggregation | |
| Auto-interval date histogram | β | autoDateHistogramAggregation,... | |
| Categorize text | β | categorizeTextAggregation,... | |
| Children | π² | ||
| Composite | π² | ||
| Date histogram | β | dateHistogramWithCalendarInterval, dateHistogramWithFixedInterval,... | |
| Date range | β | dateRangeAggregation, ... | |
| Diversified sampler | β | diversifiedSamplerAggregation | |
| Filter | β | filterAggregation | |
| Filters | β | filtersAggregation | |
| Frequent item sets | β | frequentItemSetsAggregation,... | |
| Geo-distance | β | geoDistanceAggregation,... | |
| Geohash grid | β | geoHashGridAggregation,... | |
| Geohex grid | β | geoHexGridAggregation,... | |
| Geotile grid | β | geoTileGridAggregation,... | |
| Global | β | globalAggregation | |
| Histogram | β | histogramAggregation,... | |
| IP prefix | β | ipPrefixAggregation,... | |
| IP range | β | ipRangeAggregation,... | |
| Missing | β | missingAggregation | |
| Multi Terms | β | multiTermsAggregation,... | |
| Nested | β | nestedAggregation | |
| Parent | π² | ||
| Random sampler | π² | ||
| Range | β | rangeAggregation,... | |
| Rare terms | β | rareTermsAggregation,... | |
| Reverse nested | π² | ||
| Sampler | β | samplerAggregation | |
| Significant terms | β | significantTermsAggregation | |
| Significant text | β | significantTextAggregation | |
| Terms | β | termsAggregation | |
| Time series | β | timeSeriesAggregation | |
| Variable width histogram | β | variableWidthHistogramAggregation |
See test class ElasticMetricsAggregationsTest
| Aggregation Type | Supported | Tests |
|---|---|---|
| Avg | β | avgAggregation |
| Boxplot | β | boxPlotAggregation,... |
| Cardinality | β | cardinalityAggregation |
| Extended stats | β | extendedStatsAggregation |
| Geo-bounds | β | geoBoundsAggregation |
| Geo-centroid | β | geoCentroidAggregation |
| Geo-line | β | geoLineAggregation |
| Cartesian-bounds | β | cartesianBoundsAggregation |
| Cartesian-centroid | β | cartesianCentroidAggregation |
| Matrix stats | β | matrixStatsAggregation |
| Max | β | maxAggregation |
| Median absolute deviation | β | medianAbsolutDeviationAggregation |
| Min | β | minAggregation |
| Percentile ranks | β | percentileRanksAggregation |
| Percentiles | β | percentilesAggregation,... |
| Rate | β | rateAggregation,... |
| Scripted metric | π² | |
| Stats | β | |
| String stats | β | stringStatsAggregation,... |
| Sum | β | sumAggregation |
| T-test | β | tTestAggregation,... |
| Top hits | β | topHitsAggregation |
| Top metrics | π² | |
| Value count | β | valueCountAggregation |
| Weighted avg | β | weightAvgAggregation,... |
| Aggregation Type | Supported |
|---|---|
| Average bucket | π² |
| Bucket script | π² |
| Bucket count K-S test | π² |
| Bucket correlation | π² |
| Bucket selector | π² |
| Bucket sort | π² |
| Change point | π² |
| Cumulative cardinality | π² |
| Cumulative sum | π² |
| Derivative | π² |
| Extended stats bucket | π² |
| Inference bucket | π² |
| Max bucket | π² |
| Min bucket | π² |
| Moving function | π² |
| Moving percentiles | π² |
| Normalize | π² |
| Percentiles bucket | π² |
| Serial differencing | π² |
| Stats bucket | π² |
| Sum bucket | π² |
The current version also provides a minimal templated support for deserializing Elastic responses.
For example, given the domain model record:
record Person(
String firstname,
String birthdate,
String city
) {}
Elastic search responses can be parsed using:
ObjectMapper mapper = new ObjectMapper();
ElasticSearchResponse<Person> actual = mapper.readValue(rawResponse, new TypeReference<>() {});
HINT: Supporting LocalDate for the birthdate field simply requires adding the Java module:
<dependency> <groupId>com.fasterxml.jackson.datatype</groupId> <artifactId>jackson-datatype-jsr310</artifactId> <version>2.6.0</version> </dependency>and registering it:
ObjectMapper mapper = new ObjectMapper(); mapper.registerModule(new JavaTimeModule());
| Setting | Supported |
|---|---|
| index.number_of_shards | β |
| index.number_of_routing_shards | π² |
| index.codec | π² |
| index.routing_partition_size | π² |
| index.soft_deletes.retention_lease.period | π² |
| index.load_fixed_bitset_filters_eagerly | π² |
| index.shard.check_on_startup | π² |
| Setting | Supported |
|---|---|
| index.number_of_replicas | β |
| index.auto_expand_replicas | π² |
| index.search.idle.after | π² |
| index.refresh_interval | β |
| index.max_result_window | π² |
| index.max_inner_result_window | π² |
| index.max_rescore_window | π² |
| index.max_docvalue_fields_search | π² |
| index.max_script_fields | π² |
| index.max_ngram_diff | π² |
| index.max_shingle_diff | π² |
| index.max_refresh_listeners | π² |
| index.analyze.max_token_count | π² |
| index.highlight.max_analyzed_offset | π² |
| index.max_terms_count | π² |
| index.max_regex_length | π² |
| index.query.default_field | π² |
| index.routing.allocation.enable | π² |
| index.gc_deletes | π² |
| index.default_pipeline | π² |
| index.final_pipeline | π² |
| index.hidden | π² |
| index.dense_vector.hnsw_filter_heuristic | π² |
| index.esql.stored_fields_sequential_proportion | π² |
| Token Filter | Supported |
|---|---|
| CJK bigram | π² |
| Common grams | π² |
| Conditional | β |
| Delimited payload | π² |
| Dictionary decompounder | β |
| Edge n-gram | π² |
| Elision | π² |
| Fingerprint | π² |
| Flatten graph | π² |
| Hunspell | π² |
| Hyphenation decompounder | π² |
| Keep types | π² |
| Keep words | π² |
| Keyword marker | π² |
| Length | π² |
| Limit token count | π² |
| Lowercase | π² |
| MinHash | π² |
| Multiplexer | π² |
| N-gram | π² |
| Pattern capture | π² |
| Pattern replace | π² |
| Predicate script | π² |
| Shingle | β |
| Stemmer | π² |
| Stemmer override | π² |
| Stop | π² |
| Synonym | π² |
| Synonym graph | π² |
| Truncate | π² |
| Unique | π² |
| Word delimiter | π² |
| Word delimiter graph | π² |
The current version also provides a (still limited) DSL for mapping definitions.
| Type | Supported |
|---|---|
| binary | β |
| boolean | β |
| keyword | β |
| constant_keyword | π² |
| wildcard | π² |
| long | β |
| integer | β |
| short | β |
| byte | β |
| double | β |
| float | β |
| half_float | β |
| scaled_float | β |
| unsigned_long | β |
| date | β |
| date_nanos | β |
| object | β |
| flattened | π² |
| nested | β |
| join | β |
| passthrough | π² |
| integer_range | π² |
| float_range | π² |
| long_range | π² |
| double_range | π² |
| date_range | π² |
| ip_range | π² |
| ip | π² |
| version | π² |
| aggregate_metric_double | π² |
| histogram | π² |
| text | β |
| match_only_text | π² |
| search_as_you_type | π² |
| semantic_text | π² |
| token_count | π² |
| dense_vector | π² |
| sparse_vector | π² |
| rank_feature | π² |
| rank_features | π² |
| geo_point | π² |
| geo_shape | π² |
| point | π² |
| shape | π² |
- Indexing requests
- Ensuring field compatibility between index mappings and queries (using type-safety)