Skip to content

Commit 1554988

Browse files
authored
Merge pull request #36 from dvla/feature/add-kaping-gem
Feature/add kaping gem
2 parents 3177d89 + 23dac6e commit 1554988

4 files changed

Lines changed: 207 additions & 2 deletions

File tree

config.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
baseURL: "https://dvla.github.io/"
22
title: DVLA Engineering
3-
paginate: 5
3+
pagination:
4+
pagerSize: 5
45
theme: PaperMod
56

67
enableRobotsTXT: true

content/open-source/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ Open Source library to apply Royal Mail Rules & Exceptions to PAF (Postcode Addr
5454

5555
This gem has pre-configured browser drivers that you can use out-of-the-box for the development of your automated test suite or application.
5656

57+
### [dvla-kaping](https://github.com/dvla/kaping)
58+
59+
Ka-Ping! An Idiomatic ruby way to construct ElasticSearch queries.
60+
5761
# Dynamics 365
5862

5963
### [dataverse-helper](https://github.com/dvla/dataverse-helper)
Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
---
2+
author: "Kevin Upstill"
3+
title: "KA-PING! a gem to integrated with Elastic and OpenSearch"
4+
description: "Idiomatic may to build complex OpenSearch queries in Ruby."
5+
draft: false
6+
date: 2025-04-24
7+
tags: ["Ruby, ElasticSearch, OpenSearch"]
8+
categories: ["Ruby, Elastic"]
9+
ShowToc: true
10+
TocOpen: true
11+
---
12+
13+
***You can only stretch elastic so far before it goes KA-PING!***
14+
15+
The starting point for creating a new DVLA gem to integrate with AWS OpenSearch and ElasticSearch was the amount of documentation
16+
there is surrounding the technology. There is a lot for a reason, it's a very powerful and useful tool,
17+
but the flip side is the learning curve involved in understanding all the features and query structure.
18+
19+
Our use case is we wanted squads to have a simple way to retrieve specific driver test data to use in their
20+
tests without having to fully understand Elasticsearch and all its capabilities.
21+
22+
A lot of our tests require very specific data requirements which can result in complex search queries with multiple parameters and
23+
search terms to consider. The code to write these complex queries can ballon into a very deep nested JSON.
24+
25+
Out test platform architecture is based on Ruby, so we wanted a Ruby way to write large queries easily, which lead to the
26+
development of a query builder using the dot notation concept to chain the queries terms together.
27+
28+
We didn't need every aspect of the full OpenSearch capabilities to begin with, so we started with a sub set of search terms
29+
which we commonly use. With this in mind there is further development work to cover more of the OpensSearch functions down the line.
30+
31+
32+
## The Ka-Ping Ruby Gem
33+
34+
The Ka-Ping Ruby Gem enables the user to build complex ElasticSearch DSL Queries for searching and filtering large data sets without
35+
having to worry about formatting the JSON payloads.
36+
37+
Using intuitive search terms and operations, it's easier to construct human-readable search definitions without needing a deep
38+
understanding of the Query DSL syntax.
39+
40+
### Complex search term query looks like with JSON notation
41+
Probably the best way to demonstrate Kaping is to look at the traditional query construct, then the Kaping way.
42+
43+
44+
#### Traditional JSON query
45+
```ruby
46+
47+
def multi
48+
date = DateTime.now.strftime('%Y-%m-%d')
49+
50+
query = {
51+
query: {
52+
bool: {
53+
must: [
54+
{ match_phrase: { 'fruit.destination': 'Market' } },
55+
{ match_phrase: { 'fruit.code': 'LA-MOP' } },
56+
{ match_phrase: { fruitStatus: 'Ripe' } },
57+
{ match_phrase: { 'current.address.first_line': 'Plantation Row' } },
58+
{ match: { 'fruit.category': 'Tropical' } },
59+
{ range: { 'fruit.pickedDate' => { gte: '2025-08-21', lte: date } } },
60+
{ range: { 'fruit.inspection.taken' => { gte: date } } },
61+
{ range: { 'fruit.importDate' => { gte: date } } },
62+
],
63+
must_not: [{ exists: { field: 'fruitEndDate' } },
64+
{ exists: { field: 'fruit.import.USATariffs' } },
65+
{ exists: { field: 'fruit.status.goneBad' } },
66+
{ match_phrase: { 'fruit.category': { query: 'B A NN NA AN', operator: 'or' } } },
67+
{ wildcard: { 'fruit.address.import.code': 'NIT*' } },
68+
{ wildcard: { 'fruit.address.import.code': 'NAT*' } },
69+
{ wildcard: { 'fruit.address.import.code': 'NOO*' } }],
70+
filter: { term: { 'licence.fruit': 'Valid' } },
71+
72+
},
73+
match: { 'fruit.type': 'Tropical' },
74+
},
75+
}
76+
end
77+
```
78+
In the above example the nested JSON can become quite a handful, Kaping solves this by making the code more human readable.
79+
80+
81+
#### What the query looks likes with Kaping
82+
```ruby
83+
def multi
84+
date = DateTime.now.strftime('%Y-%m-%d')
85+
86+
q = DVLA::Kaping::Query.new('bool')
87+
q.must.match('fruit.destination', 'Market').
88+
match_phrase('fruit.code', 'LA-MOP').
89+
match_phrase('fruitStatus', 'Ripe').
90+
match_phrase('current.address.first_line', 'Plantation Row').
91+
between('fruit.pickedDate', '2025-08-21', date.to_s).
92+
between('fruit.inspection.taken', '2025-08-21', date.to_s).
93+
between('fruit.importDate', '2025-08-21', date.to_s).
94+
match('fruit.category', 'Tropical')
95+
q.must_not.
96+
exists('field', 'fruitEndDate').
97+
exists('field', 'fruit.import.USATariffs').
98+
exists('field', 'fruit.status.goneBad').
99+
match_phrase('fruit.category', query: 'B A NN NA AN', operator: 'or').
100+
wildcard('fruit.address.import.code', 'NIT*').
101+
wildcard('fruit.address.import.code', 'NAT*').
102+
wildcard('fruit.address.import.code', 'NOO*').
103+
filter('licence.fruit', 'Valid')
104+
q.to_json
105+
end
106+
```
107+
108+
The key point here is you don't need to worry about the nested JSON structure, the naming convention is intuitive and closely resembles
109+
the OpenSearch syntax.
110+
111+
Let's break that down further
112+
113+
```Ruby
114+
my_query = DVLA::Kaping::Query.new('bool')
115+
```
116+
This is the starting point for creating a query definition by calling an instance of the Kaping:: Query
117+
class and assigning it to a variable, e.g. 'my_query'
118+
119+
We then set the type of query we want. The common ones are 'bool' or 'match'
120+
depending on your search context.
121+
122+
If don't require a complex query you could do a very basic match query:
123+
124+
```Ruby
125+
my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar')
126+
my_query.to_json
127+
```
128+
this is equivalent to writing this query in JSON
129+
130+
```Ruby
131+
my_query = { "query":
132+
{ "match_phrase":
133+
{ "foo": "Bar" }
134+
}
135+
}
136+
```
137+
138+
With Kaping the JSON formation and formatting is taken care of with the common Ruby call .to_json
139+
140+
In the large example above, each line is a new search term definition. There are various different terms you can use depending on what
141+
functionality you require. You can group these terms in positive or negative boolean operations.
142+
143+
#### Current sub list of terms are:
144+
145+
- **match_phrase** - match documents that contain an exact phrase
146+
- **match** - full-text search on a specific document field
147+
- **exist** - search for documents that contain a specific field.
148+
- **wildcard** - match a wildcard pattern, such as He**o
149+
- **term** - search for exact term in a field.
150+
- **prefix** - search for terms that begin with a specific prefix
151+
- **regex** - search for terms that match a regular expression, eg "[a-zA-Z]amlet"
152+
- **between** - search for a range of values in a field
153+
154+
These are a mix of full-text queries and term-level queries, but they are most commonly use for our kind of searches,
155+
other can easily be added as requirements dictate.
156+
157+
The next part is the data field you want to search on
158+
159+
> DVLA::Kaping::Query.new('match_phrase', **foo**: 'Bar')
160+
161+
Then the last part is the data you are looking for, the data type can be a range, exact match string, regex or a filter for example
162+
163+
> DVLA::Kaping::Query.new('match_phrase', foo: **' Bar'**)
164+
165+
## Configuration
166+
The Gem can be configured through a 'kaping.yml' file. Such configs as the logging level, result size and the AWS configs
167+
can be set in the yaml, The config file can also be used to pick up any environment settings which is useful if running
168+
in a CI pipeline.
169+
170+
171+
## The Client
172+
Currently we have a AWS OpenSearch client which takes care of the Sig4C signing. The AWS credentials can either be supplied
173+
as ENV variables or through a profile.
174+
175+
## Search
176+
As long as your client is configure you can also use the optional built-in search facility
177+
178+
```ruby
179+
my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar')
180+
my_query.to_json
181+
response = DVLA::Kaping.search(my_query)
182+
response.dig('hits', 'hits')
183+
```
184+
185+
## Opens Source
186+
187+
[External link to Kaping in rubygems](https://rubygems.org/gems/dvla-kaping)
188+
189+
We wanted to open source this Ruby Gem to a wider audiance so that others can also benefit from simplifying their OpenSearch queries. We also
190+
embrace the feedback to further enhance the Gem and increase it's scope.
191+
192+
193+
## Further Development
194+
195+
As mentioned before, not all the functionality of OpenSearch has been implemented, but any requests to expand will be taken into consideration.
196+
The code base for Kaping is small, the query builder is 40 lines of code. The separation of terms into their own module makes it easy to
197+
add additional query terms as required.
198+
199+
200+

themes/PaperMod

0 commit comments

Comments
 (0)