Faster selector execution by pre-parsing fields #5869
Draft
+149
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
While working on
mango_selector:match()to implement detailed match failure information, I noticed that every time a non-operator field is evaluated, we callmango_doc:get_field(), which callsmango_util:parse_field()if its input is a binary. We can avoid re-parsing fields so often by changingnorm_fieldsso that it normalizes to the parsed form[<<"a">>, <<"b">>]instead of<<"a.b">>.The changes have made the benchmarks I added about 30-40% faster. (I wasn't sure where benchmarks like this should live, if anywhere... they're mostly here for me to check the effect and I'm happy to move or remove them before merging.)
In the first commit here I make this change and then change all the other logic that interacts with selectors to also use this form. This ended up having a bigger blast radius than I would have liked, so I undid most of it and changed the
mango_selectorfunctions and a couple others so they hide how normalized selector fields actually look from other callers. There's only a couple of places where these normalized selectors need to be turned back into valid JSON, and themango_util:join_keys()function turns the array keys back into binaries as needed.There are a couple of places in the logic for choosing indexes for queries where we now have to parse fields to match the form they appear in in selectors, but I figure doing this once per query during index selection is better than doing it once per selector-field evaluation during index updates or filtering.
I assume the latter commit is the state we'd actually want to keep so I'm happy to squash all the other edits away. I also need to write proper commits messages; I'll tidy the history up once I know what should be kept in the history.
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.inisrc/docsfolder