Add Hybrid Query Builder by jazairi · Pull Request #960 · MITLibraries/timdex

jazairi · 2026-04-23T18:05:26Z

Why these changes are being introduced:

Now that we have separate builders for semantic
and lexical queries, we can add an option to
combine the two.

Relevant ticket(s):

How this addresses that need:

This adds a new query builder model for hybrid
queries in OpenSearch, and exposes the new query
type in the GraphQL API.

Side effects of this change:

We may need to manually weight the scoring due to the different scoring algorithms used by
semantic and lexical search. That felt out of
scope of this change, but may warrant further
exploration.
The query builder checks only for errors in the semantic builder. We don't have any error
handling in the lexical builder because it
doesn't involve any external calls. This is
probably okay because it's highly unlikely for
lexical search to error, but worth noting.

Developer

All new ENV is documented in README
All new ENV has been added to Heroku Pipeline, Staging and Prod
ANDI or Wave has been run in accordance to
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

The commit message is clear and follows our guidelines
(not just this pull request message)
There are appropriate tests covering any new functionality
The documentation has been updated or is unnecessary
The changes have been verified
New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

NO

Why these changes are being introduced: Now that we have separate builders for semantic and lexical queries, we can add an option to combine the two. Relevant ticket(s): - [USE-413](https://mitlibraries.atlassian.net/browse/USE-413) - [USE-414](https://mitlibraries.atlassian.net/browse/USE-414) How this addresses that need: This adds a new query builder model for hybrid queries in OpenSearch, and exposes the new query type in the GraphQL API. Side effects of this change: - We may need to manually weight the scoring due to the different scoring algorithms used by semantic and lexical search. That felt out of scope of this change, but may warrant further exploration. - The query builder checks only for errors in the semantic builder. We don't have any error handling in the lexical builder because it doesn't involve any external calls. This is probably okay because it's highly unlikely for lexical search to error, but worth noting.

qltysh · 2026-04-23T18:07:08Z

❌ 6 blocking issues (8 total)

Tool	Category	Rule	Count
rubocop	Lint	Method has too many lines. [13/10]	3	❌
rubocop	Lint	Assignment Branch Condition size for `combine\_queries` is too high. [<7, 14, 11> 19.13/17]	1	❌
rubocop	Lint	Cyclomatic complexity for `combine\_queries` is too high. [11/7]	1	❌
rubocop	Lint	Perceived complexity for `combine\_queries` is too high. [12/8]	1	❌
qlty	Structure	Function with high complexity (count = 9): combine_queries	2

Copilot

Pull request overview

Adds support for a new “hybrid” OpenSearch query mode that combines existing semantic (vector) and lexical (keyword) query builders, and exposes the mode via the GraphQL API.

Changes:

Add HybridQueryBuilder to combine semantic + lexical queries with a fallback to lexical-only on semantic failure.
Route query_mode: 'hybrid' to the new builder in Opensearch#query.
Update GraphQL query_mode argument description and add tests for the new builder selection and hybrid builder behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`app/models/opensearch.rb`	Routes `query_mode == 'hybrid'` to `HybridQueryBuilder`.
`app/models/hybrid_query_builder.rb`	Implements hybrid query composition and semantic-failure fallback behavior.
`app/graphql/types/query_type.rb`	Documents the new `query_mode` option in the GraphQL API.
`test/models/opensearch_test.rb`	Adds a unit test asserting `HybridQueryBuilder` is selected for `query_mode: hybrid`.
`test/models/hybrid_query_builder_test.rb`	Adds unit tests for hybrid builder blank-query behavior, composition, key normalization, and fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This implements some feedback from Copilot during code review.

JPrevost

I don't really understand some of the implementation decisions and added comments where I could use some more details.

I'd also like confirmation that you manually confirmed expected behavior before generating mocks. Mocks are great if they are used with caution, but with zero tests included that are not mocks I'd like to better understand how much you worked with actual responses in order to create the correct mocks.

I think we are missing GraphQL tests for both Hybrid and Semantic. Maybe if we add those and use VCR instead of mocks I'd feel more comfortable with the mocks here.

JPrevost · 2026-04-28T13:54:02Z

+
+  # Logs semantic query builder failures for observability
+  def log_semantic_error(error)
+    return unless defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger


This line feels overly cautious? In what situation do we think Rails won't exist and/or the logger won't be configured? We use it all the time in this application without a block like this right?

Ha! Yeah, if Rails doesn't exist, we have bigger problems. Good catch.

JPrevost · 2026-04-28T13:57:50Z

+  end
+
+  # Only fall back to lexical for Lambda/invocation errors from SemanticQueryBuilder.
+  # All other errors (parsing, validation, unexpected bugs) should be re-raised.


I feel like all StandardErrors are raised as a LambdaError from SemanticBuilder. Does this do what the comment says?

I think so, yeah. Only the invoke call is wrapped in LambdaError. parse_lambda_payload and parse_lambda_response calls happen outside the rescue block, so JSON parsing and response validation errors should propagate as runtime errors.

JPrevost · 2026-04-28T13:59:01Z

+  end
+
+  # Recursively converts all string keys to symbols in hashes and nested structures.
+  # This ensures consistency since semantic builder returns string keys while lexical uses symbols.


Is this indicative of an inconsistency we need to fix in one of the Builders rather than normalizing it here? Why are Lexical and Semantic different?

Yeah, I think that's a reasonable conclusion. The issue is that SemanticQueryBuilder receives JSON from the Lambda (which has string keys) and returns it as-is, while LexicalQueryBuilder builds hashes with symbol keys natively. It might be better to have SemanticQueryBuilder return symbol keys. I can make that change.

If you'd prefer to treat the SemanticBuilder change as maintenance we can add a ticket or add it to the Confluence page that notes tech debt for this repo. Your call if we do this later or now.

JPrevost · 2026-04-28T14:00:46Z

+        payload: payload.to_json
+      )
+    rescue StandardError => e
+      # Only Lambda invocation errors are wrapped in LambdaError for graceful fallback


I'm curious about this comment and the logic in the Hybrid builder that relies on this. Are we not just raising all standard errors as LambdaErrors here? What does "Only Lambda invocation errors are wrapped in LambdaError for graceful fallback" mean here?

Yeah, we are. I can see how the comment might suggest we're being more selective. What I meant is that all errors from the invoke call get wrapped (allowing graceful fallback), while errors from parsing the response (below the rescue block) don't get wrapped and fail fast. I'll clarify the comment.

JPrevost · 2026-04-28T14:02:45Z

+
+    result = @builder.build(params)
+
+    # When q is blank, should return lexical query (preserves filters)


Does the API actually support this scenario? I think probably?

It does, yeah. Whether it should is another question. :)

JPrevost · 2026-04-28T14:10:46Z

+
+    result = @builder.build(params)
+
+    # Without filters, minimum_should_match should not be set


Can you help me understand why a minimum should match should be different when no filters are in place? I understand why it was in place for the filters, but does it change any results if we kept it for all queries? I'm not asking for change, just help understanding as to what this test

It doesn't change anything; it's just redundant to include minimum_should_match in non-filter searches. We don't necessarily have to test that behavior if you think it's confusing. (Or, we could change the behavior to always set minimum_should_match.)

This makes sense, thanks for helping me understand it.

JPrevost · 2026-04-28T14:13:22Z

+    lexical_mock.stubs(:build).returns({ bool: { should: [], must: [], filter: [] } })
+    semantic_mock = mock
+    # Non-Lambda errors (parsing, validation) should be re-raised
+    semantic_mock.stubs(:build).raises(StandardError.new('Invalid semantic query builder response: missing query key'))


This is the situation I don't understand if we can actually get to in code even if we can mock it. I may be misinterpreting how it works, but it looks like we just re-raise all standard errors as lambda errors and therefore we can't ever actually get to this outside of mocks.

Yeah, I think you're right. I'll drop this one.

JPrevost · 2026-04-28T14:17:10Z

+
+      # Both succeeded - combine them with should clause while preserving filters
+      combine_queries(semantic_query, lexical_query)
+    rescue StandardError => e


This feels slightly odd to rescue Standard and then filter what we actually want to raise rather than rescuing the actual things we want to handle. Is there a reason you went with StandardError here rather than specific error types?

Yeah, it does feel overly defensive. We can just rescue SemanticQueryBuilder::LambdaError here catching StandardError and filtering.

qltysh · 2026-04-28T20:55:15Z

+
+    {
+      bool: hybrid_bool
+    }


Found 5 issues:

1. Function with high complexity (count = 9): combine_queries [qlty:function-complexity]

2. Assignment Branch Condition size for combine_queries is too high. [<7, 14, 11> 19.13/17] [rubocop:Metrics/AbcSize]

3. Cyclomatic complexity for combine_queries is too high. [11/7] [rubocop:Metrics/CyclomaticComplexity]

4. Method has too many lines. [23/10] [rubocop:Metrics/MethodLength]

5. Perceived complexity for combine_queries is too high. [12/8] [rubocop:Metrics/PerceivedComplexity]

qltysh · 2026-04-28T20:55:15Z


+    # Response parsing below is outside the rescue block, so JSON/validation errors
+    # propagate as-is and fail fast rather than triggering graceful fallback.
    parse_lambda_payload(response.payload)


Method has too many lines. [12/10] [rubocop:Metrics/MethodLength]

qltysh · 2026-04-28T20:55:15Z

+      value.map { |item| normalize_keys(item) }
+    else
+      value
+    end


Function with high complexity (count = 7): normalize_keys [qlty:function-complexity]

jazairi · 2026-04-28T21:04:05Z

@JPrevost Thanks for the thorough review! Could you let me know if the latest commit addresses your concerns? I added a few VCR tests, but I still stubbed the Lambda responses because I was getting auth errors. I assume it's set up to accept requests only from specific domains, but happy to explore further if that assumption is incorrect.

JPrevost

This looks good. Only remaining minor nit is calling a logging method instead of just doing the Rails.logger call directly.

I'm also nervous we're just logging and not throwing to Sentry when we silently fallback like this. That should be considered exceptional so maybe it's worth adding that here too as we'll never notice this in the logs?

No change required, but the lambdas are not restricted from being called from specific URLs so your local .env was likely just not updated to support them. You need the same lambda tier as your open search tier to ensure they work as expected.

JPrevost · 2026-04-29T18:13:38Z

-      # Semantic builder failed (Lambda error, etc.) - use lexical only
+    rescue SemanticQueryBuilder::LambdaError => e
+      # Lambda service failure - gracefully fall back to lexical search
      log_semantic_error(e)


I'd bring that logging statement into this block. It currently only does a logging statement so this is mostly just indirection.

qltysh · 2026-04-29T20:54:41Z

+        "HybridQueryBuilder semantic query failed: #{e.class}: #{e.message}"
+      )
+      lexical_query
+    end


Method has too many lines. [13/10] [rubocop:Metrics/MethodLength]

jazairi · 2026-05-01T16:08:36Z

@JPrevost Could you please check the latest commit to wrap the Lambda calls with VCR? It's not a huge change, but I'd like a second set of eyes before we merge.

JPrevost · 2026-05-01T18:26:30Z

-      secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY')
-    )
-  end
+  options = {


Yeah this is probably better configuration even if you weren't adding the new feature to override the endpoint :)

JPrevost · 2026-05-01T18:27:28Z

 EMAIL_URL_HOST=localhost:3000
 JWT_SECRET_KEY=3862fc949629030de4259b88f6e8f7c3702b2fabfc68d00d46fb7f9f70110690b526997ef4d77765ffa010d8aba440286af39947d0c85287174d99be2db14987
 OPENSEARCH_INDEX=all-current
+AWS_ENDPOINT_URL_LAMBDA=http://localhost:9200


I think I'm okay with this not being documented as we don't anticipate it ever being used anywhere but the tests.

Oops, I did mean to document it, just to say that it's test env only. Will do so before merging.

qltysh Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread app/graphql/types/query_type.rb

Comment thread app/models/hybrid_query_builder.rb

Comment thread app/models/hybrid_query_builder.rb Outdated

mitlib temporarily deployed to timdex-api-p-use-413-us-r5j62i April 23, 2026 18:08 Inactive

jazairi requested review from JPrevost, Copilot and matt-bernhardt April 23, 2026 20:47

Copilot started reviewing on behalf of jazairi April 23, 2026 20:48 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread app/models/hybrid_query_builder.rb

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread test/models/hybrid_query_builder_test.rb

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 17:21 Inactive

qltysh Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/hybrid_query_builder.rb

Comment thread app/models/hybrid_query_builder.rb

jazairi requested a review from Copilot April 27, 2026 17:24

Copilot AI reviewed Apr 27, 2026

View reviewed changes

jazairi requested a review from Copilot April 27, 2026 17:48

Copilot started reviewing on behalf of jazairi April 27, 2026 17:49 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread test/models/hybrid_query_builder_test.rb

Comment thread test/models/hybrid_query_builder_test.rb Outdated

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread app/models/hybrid_query_builder.rb

jazairi force-pushed the use-413-use-414 branch from bcd1cf0 to d4356bf Compare April 27, 2026 17:59

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 18:00 Inactive

jazairi requested a review from Copilot April 27, 2026 18:00

Copilot started reviewing on behalf of jazairi April 27, 2026 18:00 View session

qltysh Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/hybrid_query_builder.rb

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread test/models/hybrid_query_builder_test.rb

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 18:25 Inactive

jazairi requested a review from Copilot April 27, 2026 18:25

Copilot started reviewing on behalf of jazairi April 27, 2026 18:25 View session

qltysh Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/hybrid_query_builder.rb

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread test/models/hybrid_query_builder_test.rb Outdated

Comment thread app/models/hybrid_query_builder.rb Outdated

Comment thread app/models/hybrid_query_builder.rb Outdated

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 20:06 Inactive

jazairi requested a review from Copilot April 27, 2026 20:06

Copilot started reviewing on behalf of jazairi April 27, 2026 20:07 View session

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 20:11 Inactive

qltysh Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/semantic_query_builder.rb

Address observability and filter issues in hybrid search

7468316

This implements some feedback from Copilot during code review.

jazairi force-pushed the use-413-use-414 branch from 810aed4 to 7468316 Compare April 27, 2026 20:28

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 27, 2026 20:28 Inactive

qltysh Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread app/models/semantic_query_builder.rb

JPrevost self-assigned this Apr 28, 2026

JPrevost reviewed Apr 28, 2026

View reviewed changes

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 28, 2026 20:53 Inactive

qltysh Bot reviewed Apr 28, 2026

View reviewed changes

Address code review feedback

ab32c4e

jazairi force-pushed the use-413-use-414 branch from a3bf55a to ab32c4e Compare April 28, 2026 20:57

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 28, 2026 20:57 Inactive

jazairi requested a review from JPrevost April 28, 2026 21:08

JPrevost reviewed Apr 29, 2026

View reviewed changes

Fix logging and add Sentry reporting

612845b

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i April 29, 2026 20:52 Inactive

qltysh Bot reviewed Apr 29, 2026

View reviewed changes

jazairi requested a review from JPrevost April 29, 2026 22:02

Reconfigure lambda to allow for VCR recordings

73d5e1f

JPrevost approved these changes May 1, 2026

View reviewed changes

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i May 1, 2026 16:04 Inactive

jazairi requested a review from JPrevost May 1, 2026 16:08

JPrevost approved these changes May 1, 2026

View reviewed changes

Document AWS_ENDPOINT_URL_LAMBDA value in .env.test

eacbff5

jazairi temporarily deployed to timdex-api-p-use-413-us-r5j62i May 4, 2026 20:12 Inactive

jazairi merged commit e915ecb into main May 4, 2026
2 checks passed

jazairi deleted the use-413-use-414 branch May 4, 2026 20:19


		result = @builder.build(params)

		# When q is blank, should return lexical query (preserves filters)


		result = @builder.build(params)

		# Without filters, minimum_should_match should not be set

Conversation

jazairi commented Apr 23, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why these changes are being introduced:

Relevant ticket(s):

How this addresses that need:

Side effects of this change:

Developer

Code Reviewer

Requires database migrations?

Includes new or updated dependencies?

Uh oh!

qltysh Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JPrevost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jazairi commented Apr 23, 2026 •

edited by atlassian Bot

Loading

qltysh Bot commented Apr 23, 2026 •

edited

Loading