Skip to content

bhabegger/elastic-dsl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

98 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


[TOC]


Lightweight User-friendly Elastic/OpenSearch Query Java DSL

This project provides a usability-first Java DSL library allowing to build elastic queries serializable to the Elastic JSON DSL using Jackson. It has user usability and concision in mind. As a DSL it provides type safety and avoids mistakes, but it also has been designed to be more straightforward in expressing standard situations.

It's a (quasi) self-contained library with no direct dependency to either Elastic or OpenSearch clients or even an HTTP client (the choice is yours) and therefore can be integrated seamlessly into your project with a tiny overhead.

It's only non-testing dependency ia jackson to be able to help serialization and deserialization. Usage of Jackson should be reduced enough to very stable aspects of it to allow overriding the version without hassle. All other dependencies (junit and assertJ) are only for testing.

NOTE: The DSL also works for both Elastic and OpenSearch as the APIs follow each other closely

This DSL is more straight forward than the standard OpenSearch DSL abusing lambdas. It is also somewhat easier than Elastic's DSL which strictly follows the API structure (and therefore inherits some of its complexities)

For example, instead of having to (cumbersomely) write:

{
    "query": {
        "bool": {
            "should": [
                {
                    "range": {
                        "birthdate": {
                            "gte": "1990-01-01",
                            "lte": "2000-01-01"
                        }
                    }
                }
            ],
            "must": [
                {
                    "term": {
                        "firstname": "benjamin"
                    }
                }
            ],
            "filter": [
                {
                    "term": {
                        "city": "biel"
                    }
                }
            ]
        }
    }
}

The Java DSL allows to express this as:

var query = query(
    newBool()
        .must(term("firstname", "benjamin"))
        .should(range("birthdate", LocalDate.parse("1990-01-01"), LocalDate.parse("2000-01-01")))
        .filter(term("city", "biel"))
    .build()
);

Usage

Add the maven dependency

<dependency>
    <groupId>tech.habegger.elastic</groupId>
    <artifactId>elastic-dsl</artifactId>
    <version>1.0.0</version>
</dependency> 

Example

Import the constructs you need (or let your IDE do it for you ;)):

import static tech.habegger.elastic.search.ElasticBooleanClause.newBool;
import static tech.habegger.elastic.search.ElasticSearchRequest.query;
import static tech.habegger.elastic.search.ElasticTermClause.term;

And just use the DSL:

var mapper = new ObjectMapper();
var elasticQuery = query(
    newBool()
        .must(term("lastname", "habegger"))
        .should(term("firstname", "benjamin"))
    .build()
);
var queryAsString = mapper.writeValue(elasticQuery);

For a complete example, checkout SampleIndexAndSearch Which demonstrates how to use the DSL with Java's embedded HTTP client on an index named playground. The example:

  • Creates a settings item using the DSL
  • Deletes the playground index
  • Creates the playground index using the serialization of the DSL settings
  • Pushes an example document (using plain old java record)
  • Creates a query using the DSL
  • Searches the index using the serialized query.

Check out the tests

Most constructs made available through the DSL should have a unit tests. Please have a look in the test suite for example syntax.

Design

The DSL has been designed with an effort to find a good compromise between completeness (being able to express any Elastic query or aggregation) and conciseness (being able to do so easily). In order to do this, the following principles have been tried to be followed.

  • Mandatory (or very frequently used) parameters are included in the main builder method (e.g. terms must have a field name and values so those are passed as direct arguments of the terms method).
  • Optional less frequent parameters changing the behavior use modifying methods (e.g. the boxPlot aggregation takes the field as single argument and has a modifier method withCompression to set the compression when needed).
  • Only really complex situations use a more advanced "Builder" pattern requiring a final build() method call to return the serializable version of the Elastic expression. In this case, the initial building method is prefixed with new. (e.g. newBool() starts a bool expression builder).
  • In some cases, the initial newXX Builder will take mandatory parameters (e.g. newPinned method takes an Elastic clause as argument to define the query of the "organic" documents and differs the "pinning" to later calls)

Advantages

  • Removes most of the JSON-related boilerplate
  • Avoids typos and structural mistakes when writing the queries
  • Usability driven
  • More straightforward than the API structure (and the official DSLs which strictly follow this structure)

Current query support

This is an initial version of the DSL, therefore all query shapes are not yet supported. However, there is a support for custom clauses to compensate a bit for the places where support is not there yet. But do feel free to propose a merge request to get the unsupported clauses ;)

Set test class ElasticSearchCompoundQueryTest.java

Query Type Supported Tests
Boolean βœ… bool*
Boosting βœ… boostingQuery
Constant score βœ… constantScoreQuery
Disjunction max βœ… disjunctionMaxQuery
Function score βœ… functionScoreQuery

See test class ElasticSearchFullTextQueryTest

Query Type Supported Test method(s)
Intervals πŸ”²
Match βœ… matchQuery
Match boolean prefix βœ… matchPhrasePrefixQuery
Match phrase βœ… matchBoolPrefixQuery
Match phrase prefix βœ… matchPhraseQuery
Combined fields βœ… combinedFieldsQuery
Multi-match βœ… multiMatchQuery
Query string πŸ”²
Simple query string πŸ”²

See test class ElasticSearchGeoQueryTest#geoHashQuery

Query Type Supported Tests
Geo-bounding box βœ… geoBoundingBoxQuery
Geo-distance βœ… geoDistanceQuery
Geo-grid βœ… geoHashQuery
Geo-polygon βœ… geoPolygonQuery
Geoshape βœ… geoShapeInlineQuery, geoShapeIndexedQuery

See test class ElasticSearchJoinQueryTest

Query Type Supported Tests
Nested βœ… nestedQuery
Has child πŸ”²
Has parent πŸ”²
Parent ID πŸ”²
Query Type Supported
Span queries πŸ”²
Span containing πŸ”²
Span field masking πŸ”²
Span first πŸ”²
Span multi-term πŸ”²
Span near πŸ”²
Span not πŸ”²
Span or πŸ”²
Span term πŸ”²
Span within πŸ”²

See test class ElasticSearchSpecializedQueryTest

Query Type Supported Tests Notes
Distance feature βœ… distanceFeatureTemporalQuery, distanceFeatureDistanceQuery
More like this βœ… moreLikeThisQuery, moreLikeThisQueryWithInlineDoc
Percolate βœ… percolateQuery
Knn βœ… knnQuery
Rank feature βœ… rankFeatureQuery Missing function object parameters
Script πŸ”²
Script score βœ… scriptScoreQuery
Wrapper βœ… wrapperQuery
Pinned Query βœ… pinnedQuery
Rule πŸ”²

See test class ElasticSearchTermLevelQueryTest

Query Type Supported Test method(s)
Exists βœ… existsQuery
Fuzzy βœ… fuzzySimple, fuzzyComplex
IDs βœ… idsQuery
Prefix βœ… prefixQuery
Range βœ… rangeBoth, rangeQueryGteOnly, rangeQueryLteOnly
Regexp βœ… regexpQuerySimple, regexpQueryMultipleFlags
Term βœ… termQuery
Terms βœ… termsQuery
Terms set βœ… termsSetQueryWithScript
Wildcard βœ… wildcardQuery

Other queries

See test class ElasticSearchOtherQueryTest

Query Type Supported Tests
Shape πŸ”²
Match All βœ… matchAllQuery
Text expansion query πŸ”²

Current aggregation support

See test class ElasticBucketAggregationsTest

Aggregation Type Supported Tests Notes
Adjacency matrix βœ… adjacencyMatrixAggregation
Auto-interval date histogram βœ… autoDateHistogramAggregation,...
Categorize text βœ… categorizeTextAggregation,...
Children πŸ”²
Composite πŸ”²
Date histogram βœ… dateHistogramWithCalendarInterval, dateHistogramWithFixedInterval,...
Date range βœ… dateRangeAggregation, ...
Diversified sampler βœ… diversifiedSamplerAggregation
Filter βœ… filterAggregation
Filters βœ… filtersAggregation
Frequent item sets βœ… frequentItemSetsAggregation,...
Geo-distance βœ… geoDistanceAggregation,...
Geohash grid βœ… geoHashGridAggregation,...
Geohex grid βœ… geoHexGridAggregation,...
Geotile grid βœ… geoTileGridAggregation,...
Global βœ… globalAggregation
Histogram βœ… histogramAggregation,...
IP prefix βœ… ipPrefixAggregation,...
IP range βœ… ipRangeAggregation,...
Missing βœ… missingAggregation
Multi Terms βœ… multiTermsAggregation,...
Nested βœ… nestedAggregation
Parent πŸ”²
Random sampler πŸ”²
Range βœ… rangeAggregation,...
Rare terms βœ… rareTermsAggregation,...
Reverse nested πŸ”²
Sampler βœ… samplerAggregation
Significant terms βœ… significantTermsAggregation
Significant text βœ… significantTextAggregation
Terms βœ… termsAggregation
Time series βœ… timeSeriesAggregation
Variable width histogram βœ… variableWidthHistogramAggregation

See test class ElasticMetricsAggregationsTest

Aggregation Type Supported Tests
Avg βœ… avgAggregation
Boxplot βœ… boxPlotAggregation,...
Cardinality βœ… cardinalityAggregation
Extended stats βœ… extendedStatsAggregation
Geo-bounds βœ… geoBoundsAggregation
Geo-centroid βœ… geoCentroidAggregation
Geo-line βœ… geoLineAggregation
Cartesian-bounds βœ… cartesianBoundsAggregation
Cartesian-centroid βœ… cartesianCentroidAggregation
Matrix stats βœ… matrixStatsAggregation
Max βœ… maxAggregation
Median absolute deviation βœ… medianAbsolutDeviationAggregation
Min βœ… minAggregation
Percentile ranks βœ… percentileRanksAggregation
Percentiles βœ… percentilesAggregation,...
Rate βœ… rateAggregation,...
Scripted metric πŸ”²
Stats βœ…
String stats βœ… stringStatsAggregation,...
Sum βœ… sumAggregation
T-test βœ… tTestAggregation,...
Top hits βœ… topHitsAggregation
Top metrics πŸ”²
Value count βœ… valueCountAggregation
Weighted avg βœ… weightAvgAggregation,...
Aggregation Type Supported
Average bucket πŸ”²
Bucket script πŸ”²
Bucket count K-S test πŸ”²
Bucket correlation πŸ”²
Bucket selector πŸ”²
Bucket sort πŸ”²
Change point πŸ”²
Cumulative cardinality πŸ”²
Cumulative sum πŸ”²
Derivative πŸ”²
Extended stats bucket πŸ”²
Inference bucket πŸ”²
Max bucket πŸ”²
Min bucket πŸ”²
Moving function πŸ”²
Moving percentiles πŸ”²
Normalize πŸ”²
Percentiles bucket πŸ”²
Serial differencing πŸ”²
Stats bucket πŸ”²
Sum bucket πŸ”²

Current query response support

The current version also provides a minimal templated support for deserializing Elastic responses.

For example, given the domain model record:

record Person(
    String firstname,
    String birthdate,
    String city
) {}

Elastic search responses can be parsed using:

ObjectMapper mapper = new ObjectMapper();
ElasticSearchResponse<Person> actual = mapper.readValue(rawResponse, new TypeReference<>() {});

HINT: Supporting LocalDate for the birthdate field simply requires adding the Java module:

<dependency>
    <groupId>com.fasterxml.jackson.datatype</groupId>
    <artifactId>jackson-datatype-jsr310</artifactId>
    <version>2.6.0</version>
</dependency>

and registering it:

ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new JavaTimeModule());

Current index settings support

Static settings

Setting Supported
index.number_of_shards βœ…
index.number_of_routing_shards πŸ”²
index.codec πŸ”²
index.routing_partition_size πŸ”²
index.soft_deletes.retention_lease.period πŸ”²
index.load_fixed_bitset_filters_eagerly πŸ”²
index.shard.check_on_startup πŸ”²

Dynamic settings

Setting Supported
index.number_of_replicas βœ…
index.auto_expand_replicas πŸ”²
index.search.idle.after πŸ”²
index.refresh_interval βœ…
index.max_result_window πŸ”²
index.max_inner_result_window πŸ”²
index.max_rescore_window πŸ”²
index.max_docvalue_fields_search πŸ”²
index.max_script_fields πŸ”²
index.max_ngram_diff πŸ”²
index.max_shingle_diff πŸ”²
index.max_refresh_listeners πŸ”²
index.analyze.max_token_count πŸ”²
index.highlight.max_analyzed_offset πŸ”²
index.max_terms_count πŸ”²
index.max_regex_length πŸ”²
index.query.default_field πŸ”²
index.routing.allocation.enable πŸ”²
index.gc_deletes πŸ”²
index.default_pipeline πŸ”²
index.final_pipeline πŸ”²
index.hidden πŸ”²
index.dense_vector.hnsw_filter_heuristic πŸ”²
index.esql.stored_fields_sequential_proportion πŸ”²

Current analysis definition support

Customizable Token filters

Token Filter Supported
CJK bigram πŸ”²
Common grams πŸ”²
Conditional βœ…
Delimited payload πŸ”²
Dictionary decompounder βœ…
Edge n-gram πŸ”²
Elision πŸ”²
Fingerprint πŸ”²
Flatten graph πŸ”²
Hunspell πŸ”²
Hyphenation decompounder πŸ”²
Keep types πŸ”²
Keep words πŸ”²
Keyword marker πŸ”²
Length πŸ”²
Limit token count πŸ”²
Lowercase πŸ”²
MinHash πŸ”²
Multiplexer πŸ”²
N-gram πŸ”²
Pattern capture πŸ”²
Pattern replace πŸ”²
Predicate script πŸ”²
Shingle βœ…
Stemmer πŸ”²
Stemmer override πŸ”²
Stop πŸ”²
Synonym πŸ”²
Synonym graph πŸ”²
Truncate πŸ”²
Unique πŸ”²
Word delimiter πŸ”²
Word delimiter graph πŸ”²

Current mapping support

The current version also provides a (still limited) DSL for mapping definitions.

Type Supported
binary βœ…
boolean βœ…
keyword βœ…
constant_keyword πŸ”²
wildcard πŸ”²
long βœ…
integer βœ…
short βœ…
byte βœ…
double βœ…
float βœ…
half_float βœ…
scaled_float βœ…
unsigned_long βœ…
date βœ…
date_nanos βœ…
object βœ…
flattened πŸ”²
nested βœ…
join βœ…
passthrough πŸ”²
integer_range πŸ”²
float_range πŸ”²
long_range πŸ”²
double_range πŸ”²
date_range πŸ”²
ip_range πŸ”²
ip πŸ”²
version πŸ”²
aggregate_metric_double πŸ”²
histogram πŸ”²
text βœ…
match_only_text πŸ”²
search_as_you_type πŸ”²
semantic_text πŸ”²
token_count πŸ”²
dense_vector πŸ”²
sparse_vector πŸ”²
rank_feature πŸ”²
rank_features πŸ”²
geo_point πŸ”²
geo_shape πŸ”²
point πŸ”²
shape πŸ”²

Not yet supported

  • Indexing requests
  • Ensuring field compatibility between index mappings and queries (using type-safety)

About

Lightweight User-friendly Elastic/OpenSearch Query Java DSL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages