This repository was archived by the owner on Nov 7, 2025. It is now read-only.
🚧 [WIP] Fixing "surrounding documents" view with new _id implementation#1446
Draft
🚧 [WIP] Fixing "surrounding documents" view with new _id implementation#1446
Conversation
Merged
_id fixing "surrounding documents" view with new _id implementation
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Jun 6, 2025
An attempt to introduce unique object IDs in the ClickHouse realm.
A returned document id (`_id` field) would have the following syntax:
```
{hex-encoded timestamp field}qqq{hex-encoded hash of the document}
```
Of course, the `{hex-encoded hash of the document}` lives only in Quesma
memory:
1. It needs to be re-calculated when returning search hits
2. Quesma need to filter out based on it when rendering hits
Therefore when fetching document with specific `_id` we could filter out
documents with matching timestamp and then filter for that with matching
fields, returning the right entry.
This fixes the issue, where JSON view of single document could return a
random object from search hits, not necessarily the one clicked.
<img width="1458" alt="image"
src="https://github.com/user-attachments/assets/0b808dba-0255-4be0-ac5f-acb7a21569ba"
/>
**However** (and it's a pretty big however), the "surrounding document"
view **cannot return the surrounding documents**. While the query
doesn't error it also doesn't return any documents. The experiment to
have that working [is carried out is a separate
PR](#1446), although it's not
clear yet if this way of doing things is going to guarantee 100%
correctness.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The challenge here is to fix the "surrounding documents" view in Kibana given the new unique ID (
_idfield) implementation (ref: #1435). See Related screenshots section.At this moment Kibana just says "No documents newer/older than the anchor could be found"
Present situation
Our current implementation of the
_idfield is based on rendering this value dynamically, after fetching all the data from ClickHouse. It looks like this:While at the query parsing/execution phase we can of course access the timestamp field, the "hash of the document" part is computed during JSON response rendering.
The current implementation stores a list of IDs in
ClickhouseQueryTranslator(UniqueIDs) - therefore we know that this query was using_idfield and we have to apply extra logic on JSON response rendering. The situation is quite clear during simple filtering query:When parsing the SQL we simply take the first (timestamp) part of the query, make relevant WHERE clause which will filter out all the non-matching timestamps and then compare the doc hashes during JSON response rendering (see
platform/parsers/elastic_query_dsl/query_translator.go). Of course we have to make sure that we don't fallback to defaultLIMIT 10for our SQL clause because we might not have enough documents to filter from (see(cw *ClickhouseQueryTranslator) parseSize).Problem
When fetching "surrounding documents", Kibana sends following query:
And the
must_notquery becomes quite problematic. It's pretty obvious that when fetching next/previos N documents, we don't want to include that anchor document.So at the SQL query level we cannot filter out all the documents with matching timestamp, because our next document might have exactly the same timestamp. One approach to do so is add a schema transformer to do so.
On response rendering, we cannot rely on the current logic which just leaves only matching ids, because we want the opposite affect. However, this is post-query phase and at that level we're unaware of the query - we just have hits as the result, but didn't know whether
idshas been withinmust_not(or any other logical clause).There are few gotchas here:
sizepassed in query lands in the SQLsLIMITclause - we have to ignore it_idin any aggregation might produce completely absurd resultsPossible solution
TODO: check this PR
Related screenshots
Details