|
| 1 | +--- |
| 2 | +author: "Kevin Upstill" |
| 3 | +title: "KA-PING! a gem to integrated with Elastic and OpenSearch" |
| 4 | +description: "Idiomatic may to build complex OpenSearch queries in Ruby." |
| 5 | +draft: false |
| 6 | +date: 2025-04-24 |
| 7 | +tags: ["Ruby, ElasticSearch, OpenSearch"] |
| 8 | +categories: ["Ruby, Elastic"] |
| 9 | +ShowToc: true |
| 10 | +TocOpen: true |
| 11 | +--- |
| 12 | + |
| 13 | +***You can only stretch elastic so far before it goes KA-PING!*** |
| 14 | + |
| 15 | +The starting point for creating a new DVLA gem to integrate with AWS OpenSearch and ElasticSearch was the amount of documentation |
| 16 | +there is surrounding the technology. There is a lot for a reason, it's a very powerful and useful tool, |
| 17 | +but the flip side is the learning curve involved in understanding all the features and query structure. |
| 18 | + |
| 19 | +Our use case is we wanted squads to have a simple way to retrieve specific driver test data to use in their |
| 20 | +tests without having to fully understand Elasticsearch and all its capabilities. |
| 21 | + |
| 22 | +A lot of our tests require very specific data requirements which can result in complex search queries with multiple parameters and |
| 23 | +search terms to consider. The code to write these complex queries can ballon into a very deep nested JSON. |
| 24 | + |
| 25 | +Out test platform architecture is based on Ruby, so we wanted a Ruby way to write large queries easily, which lead to the |
| 26 | +development of a query builder using the dot notation concept to chain the queries terms together. |
| 27 | + |
| 28 | +We didn't need every aspect of the full OpenSearch capabilities to begin with, so we started with a sub set of search terms |
| 29 | +which we commonly use. With this in mind there is further development work to cover more of the OpensSearch functions down the line. |
| 30 | + |
| 31 | + |
| 32 | +## The Ka-Ping Ruby Gem |
| 33 | + |
| 34 | +The Ka-Ping Ruby Gem enables the user to build complex ElasticSearch DSL Queries for searching and filtering large data sets without |
| 35 | +having to worry about formatting the JSON payloads. |
| 36 | + |
| 37 | +Using intuitive search terms and operations, it's easier to construct human-readable search definitions without needing a deep |
| 38 | +understanding of the Query DSL syntax. |
| 39 | + |
| 40 | +### Complex search term query looks like with JSON notation |
| 41 | +Probably the best way to demonstrate Kaping is to look at the traditional query construct, then the Kaping way. |
| 42 | + |
| 43 | + |
| 44 | +#### Traditional JSON query |
| 45 | +```ruby |
| 46 | + |
| 47 | +def multi |
| 48 | + date = DateTime.now.strftime('%Y-%m-%d') |
| 49 | + |
| 50 | + query = { |
| 51 | + query: { |
| 52 | + bool: { |
| 53 | + must: [ |
| 54 | + { match_phrase: { 'fruit.destination': 'Market' } }, |
| 55 | + { match_phrase: { 'fruit.code': 'LA-MOP' } }, |
| 56 | + { match_phrase: { fruitStatus: 'Ripe' } }, |
| 57 | + { match_phrase: { 'current.address.first_line': 'Plantation Row' } }, |
| 58 | + { match: { 'fruit.category': 'Tropical' } }, |
| 59 | + { range: { 'fruit.pickedDate' => { gte: '2025-08-21', lte: date } } }, |
| 60 | + { range: { 'fruit.inspection.taken' => { gte: date } } }, |
| 61 | + { range: { 'fruit.importDate' => { gte: date } } }, |
| 62 | + ], |
| 63 | + must_not: [{ exists: { field: 'fruitEndDate' } }, |
| 64 | + { exists: { field: 'fruit.import.USATariffs' } }, |
| 65 | + { exists: { field: 'fruit.status.goneBad' } }, |
| 66 | + { match_phrase: { 'fruit.category': { query: 'B A NN NA AN', operator: 'or' } } }, |
| 67 | + { wildcard: { 'fruit.address.import.code': 'NIT*' } }, |
| 68 | + { wildcard: { 'fruit.address.import.code': 'NAT*' } }, |
| 69 | + { wildcard: { 'fruit.address.import.code': 'NOO*' } }], |
| 70 | + filter: { term: { 'licence.fruit': 'Valid' } }, |
| 71 | + |
| 72 | + }, |
| 73 | + match: { 'fruit.type': 'Tropical' }, |
| 74 | + }, |
| 75 | + } |
| 76 | +end |
| 77 | +``` |
| 78 | +In the above example the nested JSON can become quite a handful, Kaping solves this by making the code more human readable. |
| 79 | + |
| 80 | + |
| 81 | +#### What the query looks likes with Kaping |
| 82 | +```ruby |
| 83 | +def multi |
| 84 | + date = DateTime.now.strftime('%Y-%m-%d') |
| 85 | + |
| 86 | + q = DVLA::Kaping::Query.new('bool') |
| 87 | + q.must.match('fruit.destination', 'Market'). |
| 88 | + match_phrase('fruit.code', 'LA-MOP'). |
| 89 | + match_phrase('fruitStatus', 'Ripe'). |
| 90 | + match_phrase('current.address.first_line', 'Plantation Row'). |
| 91 | + between('fruit.pickedDate', '2025-08-21', date.to_s). |
| 92 | + between('fruit.inspection.taken', '2025-08-21', date.to_s). |
| 93 | + between('fruit.importDate', '2025-08-21', date.to_s). |
| 94 | + match('fruit.category', 'Tropical') |
| 95 | + q.must_not. |
| 96 | + exists('field', 'fruitEndDate'). |
| 97 | + exists('field', 'fruit.import.USATariffs'). |
| 98 | + exists('field', 'fruit.status.goneBad'). |
| 99 | + match_phrase('fruit.category', query: 'B A NN NA AN', operator: 'or'). |
| 100 | + wildcard('fruit.address.import.code', 'NIT*'). |
| 101 | + wildcard('fruit.address.import.code', 'NAT*'). |
| 102 | + wildcard('fruit.address.import.code', 'NOO*'). |
| 103 | + filter('licence.fruit', 'Valid') |
| 104 | + q.to_json |
| 105 | +end |
| 106 | +``` |
| 107 | + |
| 108 | +The key point here is you don't need to worry about the nested JSON structure, the naming convention is intuitive and closely resembles |
| 109 | +the OpenSearch syntax. |
| 110 | + |
| 111 | +Let's break that down further |
| 112 | + |
| 113 | +```Ruby |
| 114 | +my_query = DVLA::Kaping::Query.new('bool') |
| 115 | +``` |
| 116 | +This is the starting point for creating a query definition by calling an instance of the Kaping:: Query |
| 117 | +class and assigning it to a variable, e.g. 'my_query' |
| 118 | + |
| 119 | +We then set the type of query we want. The common ones are 'bool' or 'match' |
| 120 | +depending on your search context. |
| 121 | + |
| 122 | +If don't require a complex query you could do a very basic match query: |
| 123 | + |
| 124 | +```Ruby |
| 125 | +my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar') |
| 126 | +my_query.to_json |
| 127 | +``` |
| 128 | +this is equivalent to writing this query in JSON |
| 129 | + |
| 130 | +```Ruby |
| 131 | +my_query = { "query": |
| 132 | + { "match_phrase": |
| 133 | + { "foo": "Bar" } |
| 134 | + } |
| 135 | + } |
| 136 | +``` |
| 137 | + |
| 138 | +With Kaping the JSON formation and formatting is taken care of with the common Ruby call .to_json |
| 139 | + |
| 140 | +In the large example above, each line is a new search term definition. There are various different terms you can use depending on what |
| 141 | +functionality you require. You can group these terms in positive or negative boolean operations. |
| 142 | + |
| 143 | +#### Current sub list of terms are: |
| 144 | + |
| 145 | +- **match_phrase** - match documents that contain an exact phrase |
| 146 | +- **match** - full-text search on a specific document field |
| 147 | +- **exist** - search for documents that contain a specific field. |
| 148 | +- **wildcard** - match a wildcard pattern, such as He**o |
| 149 | +- **term** - search for exact term in a field. |
| 150 | +- **prefix** - search for terms that begin with a specific prefix |
| 151 | +- **regex** - search for terms that match a regular expression, eg "[a-zA-Z]amlet" |
| 152 | +- **between** - search for a range of values in a field |
| 153 | + |
| 154 | +These are a mix of full-text queries and term-level queries, but they are most commonly use for our kind of searches, |
| 155 | +other can easily be added as requirements dictate. |
| 156 | + |
| 157 | +The next part is the data field you want to search on |
| 158 | + |
| 159 | +> DVLA::Kaping::Query.new('match_phrase', **foo**: 'Bar') |
| 160 | +
|
| 161 | +Then the last part is the data you are looking for, the data type can be a range, exact match string, regex or a filter for example |
| 162 | + |
| 163 | +> DVLA::Kaping::Query.new('match_phrase', foo: **' Bar'**) |
| 164 | +
|
| 165 | +## Configuration |
| 166 | +The Gem can be configured through a 'kaping.yml' file. Such configs as the logging level, result size and the AWS configs |
| 167 | +can be set in the yaml, The config file can also be used to pick up any environment settings which is useful if running |
| 168 | +in a CI pipeline. |
| 169 | + |
| 170 | + |
| 171 | +## The Client |
| 172 | +Currently we have a AWS OpenSearch client which takes care of the Sig4C signing. The AWS credentials can either be supplied |
| 173 | +as ENV variables or through a profile. |
| 174 | + |
| 175 | +## Search |
| 176 | +As long as your client is configure you can also use the optional built-in search facility |
| 177 | + |
| 178 | +```ruby |
| 179 | +my_query = DVLA::Kaping::Query.new('match_phrase', foo: 'Bar') |
| 180 | +my_query.to_json |
| 181 | +response = DVLA::Kaping.search(my_query) |
| 182 | +response.dig('hits', 'hits') |
| 183 | +``` |
| 184 | + |
| 185 | +## Opens Source |
| 186 | + |
| 187 | +[External link to Kaping in rubygems](https://rubygems.org/gems/dvla-kaping) |
| 188 | + |
| 189 | +We wanted to open source this Ruby Gem to a wider audiance so that others can also benefit from simplifying their OpenSearch queries. We also |
| 190 | +embrace the feedback to further enhance the Gem and increase it's scope. |
| 191 | + |
| 192 | + |
| 193 | +## Further Development |
| 194 | + |
| 195 | +As mentioned before, not all the functionality of OpenSearch has been implemented, but any requests to expand will be taken into consideration. |
| 196 | +The code base for Kaping is small, the query builder is 40 lines of code. The separation of terms into their own module makes it easy to |
| 197 | +add additional query terms as required. |
| 198 | + |
| 199 | + |
| 200 | + |
0 commit comments