Optimize stats() functions, fall back on IntegrityError#4036
Open
flodolo wants to merge 6 commits intomozilla:mainfrom
Open
Optimize stats() functions, fall back on IntegrityError#4036flodolo wants to merge 6 commits intomozilla:mainfrom
flodolo wants to merge 6 commits intomozilla:mainfrom
Conversation
Codecov Report❌ Patch coverage is 🚀 New features to boost your workflow:
|
Collaborator
Author
|
I ended up with a ton of code overlap between Actually, I should be able to put it in translations after getting rid of one import. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2263
Going for a very long explanation, since folks will understand the code much better than me, and maybe there's a better approach to this.
The code in
pontoon/base/models/translation.pytakes a snapshot of the stats, at entity level, before saving the translation, then takes another after saving, and tries to store the delta viaadjust_stats().That leads to things breaking when human translator and pretranslation work on the same entity at the same time.
A possible solution is to drop the delta approach, and calculate the stats for the entire resource after saving (using
calculate_stats(). That's completely safe compared to the current approach, but costly for large resources. These are the top 20 resources in prodIn the process of explaining the code, Claude pointed out that
calculate_stats()can be made more efficient (reducing the number of queries), so that takes away part of the performance hit. But that's potentially still 5x worse in production :-(In the end (last commit) I went for a middle ground: use the same optimization for
get_stats(). The delta is still applied viaadjust_stats(), but in case of anIntegrityErrorit falls back to a fullcalculate_stats(). Also added a UI error notification, because I don't think we're showing anything at the moment?Performance benchmarks
I got Claude to come up with a couple of benchmark scripts.
calculate_stats() before and after
Script: https://gist.github.com/flodolo/187a9d7d497282eae4d3378dabd4953b
Analyzed Italian, largest 10 resources.
Locally I can get 9x improvement, in prod closer to 5x.
Local Docker install
Top 10 resources:
firefox-for-ios|firefox-ios.xliff— 1700 strings (resource_id=39)firefox-for-android|mozilla-mobile/fenix/app/src/main/res/values/strings.xml— 1680 strings (resource_id=38)firefox|browser/browser/preferences/preferences.ftl— 1016 strings (resource_id=143)firefox|browser/browser/browser.ftl— 514 strings (resource_id=105)firefox|devtools/client/netmonitor.properties— 419 strings (resource_id=349)firefox|browser/browser/newtab/newtab.ftl— 378 strings (resource_id=127)firefox|devtools/client/debugger.properties— 373 strings (resource_id=332)firefox|toolkit/toolkit/pdfviewer/viewer.ftl— 357 strings (resource_id=244)firefox|dom/chrome/dom/dom.properties— 335 strings (resource_id=180)firefox|toolkit/toolkit/neterror/nsserrors.ftl— 331 strings (resource_id=241)Overall totals (20 runs each):
Production
Top 10 resources:
sumo|LC_MESSAGES/django.po— 2611 strings (resource_id=564)marketplace|LC_MESSAGES/django.po— 1810 strings (resource_id=2614)firefox-for-ios|firefox-ios.xliff— 1700 strings (resource_id=580)firefox-for-android|mozilla-mobile/fenix/app/src/main/res/values/strings.xml— 1680 strings (resource_id=3436)amo|LC_MESSAGES/django.po— 1501 strings (resource_id=578)seamonkey|suite/chatzilla/chrome/chatzilla.properties— 1154 strings (resource_id=4291)firefox|browser/browser/preferences/preferences.ftl— 1016 strings (resource_id=3124)mozilla-accounts|settings.ftl— 974 strings (resource_id=4198)thunderbirdnet|LC_MESSAGES/messages.po— 883 strings (resource_id=3168)amo-frontend|LC_MESSAGES/amo.po— 807 strings (resource_id=2790)Overall totals (20 runs each):
Delta vs calculate_stats()
Script: https://gist.github.com/flodolo/21e66cc03bc5e8ddcc8275db1375a26a
This benchmark was used to rule out calling
calculate_stats()all the time as the solution.Benchmark with 50 translations, 5 largest resources.
Local Docker install
Production