feat: Add prepared/compiled/cached parametrized queries#2131
feat: Add prepared/compiled/cached parametrized queries#2131RuslanUC wants to merge 53 commits intotortoise:developfrom
Conversation
…ist/etc.) with different sizes
…reparedQuerySet that is prepared (._prepared = True)
add support for parameters in PreparedQuerySet.limit and PreparedQuerySet.offset
implement prepared ValuesListQuery class;
… cached in EXECUTOR_CACHE
…Query._clone; only store query and used query info in PreparedQuerySet; fix style and typing issues;
…update, .delete, .count, .exists
fix typing issue in _PreparedQueryMixin.sql
|
Hi! I looked through changes and have few concerns (I understand that it is only draft) What would be cache invalidation strategy? Is it manual only? Did you benchmark it on postgre or any other remote database? Does it provide any significant difference in that case? |
|
Hi.
For prepared queries (PreparedQuerySet, etc., not sql queries) I planned only manual invalidation (because cache can't grow "by itself", if someone prepared/cached 10 different queries - cache would contain only that 10 queries). For cached sql, I plan to add lru with max size controlled by ORM user.
Benchmarked on postgres 14.20 right now (running in local docker container, cant benchmark it using remote server), uploaded results to the repo with benchmarks. Difference is similar to one in sqlite benchmarks.
I intend it for queries that either have complex filters/joins, or executed very frequently (and queries that dont select a lot of rows, because then all time spent either on db side, or creating python objects from query results). Thats why it should be documented and described when it is better to use prepared queries (but i think right now is too early to write documentation). |
tortoise/queryset_prepared.py
Outdated
| return cast(PreparedValuesQuery, self) | ||
|
|
||
| def delete(self) -> DeleteQuery: | ||
| return cast(DeleteQuery, self) |
There was a problem hiding this comment.
is it intended behaviour? Seems more logical to throw error on trying to modify prepared query than silently returning same query without changes
There was a problem hiding this comment.
Casting to DeleteQuery instead of PreparedDeleteQuery is not an intended behaviour (i'll fix it), but returning same query with different type (e.g. in values_list, values, update, etc.) is an intended behaviour (at least for now). This is because when query is already in cache and is returned from QuerySet.prepare_sql(), all filter and other queryset methods will be executed, e.g.:
query = SomeModel.prepare_sql("some-query").filter(id=123).delete()First time the code is executed, .prepare_sql returns PreparingQuerySet, then .filter returns PreparingQuerySet and then .delete returns PreparedDeleteQuery (and places it in cache).
Next time the code is executed, .prepare_sql returns PreparedDeleteQuery (not PreparingQuerySet like it was in the first time), then .filter and .delete both retur same PreparedDeleteQuery and in the end, in both cases we have same query. If .delete would throw an error - second execution of the code will fail.
|
Honestly current implementation seems a bit over the top for what it actually does, as duplicating whole queryset, making implicit cache storages and etc is quite big change with a lot of new APIs. Have you considered simplifying it other way? E.g. instead of all that - add new "compile" method to queryset, that will returned prepared query, that user will manage himself, giving responsibility and control for cache management (and possible memory leak) to user Here is example of what that compiled query could be: General .compile method can look like this, probably overriden un QuerySet children classes: It's debatable if we should bake-in all execute variables like this, maybe we can pass them to compiled query and make them mutable, but I don't see much of advatages for that Overall, it would encapsulate majority of new logic to this new module, would not require us to maintain mirrored api of prepared queryset, it looks less prone to memory leaks and gives more control to user on how he wants to manage his queries. |
I thought about moving _cache_key and cached_sql from Prepared* classes to regular ones. I tried it a some time before and didn't like it for some reason (don't remember why now), but I know that overall implementation was different, but now I don't think that merging regular and prepared classes is that bad of an idea. I'll try it. I looked a bit into your code, I thought (around the time of first commits) about just adding method like your "compile", that should be called after while query is constructed. Now that I better know how queryset works, I think this is a good idea.
That's why I created this draft pr instead of continuing to make changes in a separate fork. Thanks for feedback and ideas. |
add ability to only compile queries after queryset was constructed
Description
This pr adds "prepared" (or compiled/cached) parametrized queries that store ready-to-execute sql and list of parameters that need to be replaced when executing such query. Such queries avoid re-calculating filters/joins/etc every time query is being constructed, which improves performance (2x in best-case-scenario).
Motivation and Context
When measuring execution time for one of my projects, I noticed that sometimes _make_query took 2x more time than _execute.
Every time query is created and then executed, Tortoise-ORM and pypika recreate sql query itself and it's parameters list (code 1), though it is probably not the case when query is created first (not await'ed) and then executed after (code 2).
But even though second code reuses sql, it has an issue - parameters can't be changed.
So I added ability to both prepare/cache queries and use placeholder parameters in such queries.
How Has This Been Tested?
Ran tests via
make testand then separately for all databases.BUT, changes are not heavily-tested right now (only ~20 tests).
My notes about implementation
Currently, pr is only a draft, because of lack of proper testing, some unresolved todos and me being unsure about some details.
When adding prepared queries, i tried to not touch existing code as much as i can to a) not break anything; b) make sure that I dont negatively affect performance of existing code.
TODOs:
Details im not sure about:
PreparingQuerySetcan be removed if_cache_keywould be in QuerySet class - i think it'd better this way (when changing/adding something - no need to change it in two places), but it also would require changing limit/offset methods, as well as slightly change UpdateQuery/DeleteQuery/CountQuery/ValuesListQuery/ValuesQuery classes.query = SomeModel.prepare_sql("query-name").filter(...).prepared()), maybeprepare_sqlandpreparedmethods could be renamed/shortened?test_queryset.py, not counting separate test files for filters, etc.), copying (or writing same tests from scratch) would result in a lot of duplication, i think.Benchmarking
I benchmarked regular and prepared queries, code and results available here.
Prepared queries perform best with .get/.get_or_none/.first queries (especially with complex queies, lots of jois, or both).
Worst case scenario for current implementation is queries with
__infilter that every time use collections with different sizes (cache miss and sql recalculation every time), but even then performance is same as with regular queries.Checklist: