Skip to content

Optimize stats() functions, fall back on IntegrityError#4036

Open
flodolo wants to merge 6 commits intomozilla:mainfrom
flodolo:issue2263
Open

Optimize stats() functions, fall back on IntegrityError#4036
flodolo wants to merge 6 commits intomozilla:mainfrom
flodolo:issue2263

Conversation

@flodolo
Copy link
Collaborator

@flodolo flodolo commented Mar 23, 2026

Fixes #2263

Going for a very long explanation, since folks will understand the code much better than me, and maybe there's a better approach to this.

The code in pontoon/base/models/translation.py takes a snapshot of the stats, at entity level, before saving the translation, then takes another after saving, and tries to store the delta via adjust_stats().
That leads to things breaking when human translator and pretranslation work on the same entity at the same time.

A possible solution is to drop the delta approach, and calculate the stats for the entire resource after saving (using calculate_stats(). That's completely safe compared to the current approach, but costly for large resources. These are the top 20 resources in prod

Project Resource Path Strings
sumo LC_MESSAGES/django.po 2611
firefox-for-ios firefox-ios.xliff 1700
firefox-for-android mozilla-mobile/fenix/app/src/main/res/values/strings.xml 1680
amo LC_MESSAGES/django.po 1501
seamonkey suite/chatzilla/chrome/chatzilla.properties 1154
firefox browser/browser/preferences/preferences.ftl 1016
mozilla-accounts settings.ftl 974
thunderbirdnet LC_MESSAGES/messages.po 883
amo-frontend LC_MESSAGES/amo.po 807
thunderbird mail/chrome/messenger/messenger.dtd 750
thunderbird mail/messenger/preferences/preferences.ftl 525
firefox browser/browser/browser.ftl 514
seamonkey suite/chrome/mailnews/messenger.dtd 511
mozilla-accounts LC_MESSAGES/client.po 506
mozilla-vpn-client mozillavpn.xliff 490
common-voice web/locales/common-voice/en/pages/common.ftl 479
thunderbird calendar/chrome/calendar/timezones.properties 443
firefox devtools/client/netmonitor.properties 419
mozilla-accounts payments-next.ftl 380
firefox browser/browser/newtab/newtab.ftl 378

In the process of explaining the code, Claude pointed out that calculate_stats() can be made more efficient (reducing the number of queries), so that takes away part of the performance hit. But that's potentially still 5x worse in production :-(

In the end (last commit) I went for a middle ground: use the same optimization for get_stats(). The delta is still applied via adjust_stats(), but in case of an IntegrityError it falls back to a full calculate_stats(). Also added a UI error notification, because I don't think we're showing anything at the moment?

Performance benchmarks

I got Claude to come up with a couple of benchmark scripts.

calculate_stats() before and after

Script: https://gist.github.com/flodolo/187a9d7d497282eae4d3378dabd4953b

Analyzed Italian, largest 10 resources.

Locally I can get 9x improvement, in prod closer to 5x.

Local Docker install

Top 10 resources:

  • firefox-for-ios|firefox-ios.xliff — 1700 strings (resource_id=39)
  • firefox-for-android|mozilla-mobile/fenix/app/src/main/res/values/strings.xml — 1680 strings (resource_id=38)
  • firefox|browser/browser/preferences/preferences.ftl — 1016 strings (resource_id=143)
  • firefox|browser/browser/browser.ftl — 514 strings (resource_id=105)
  • firefox|devtools/client/netmonitor.properties — 419 strings (resource_id=349)
  • firefox|browser/browser/newtab/newtab.ftl — 378 strings (resource_id=127)
  • firefox|devtools/client/debugger.properties — 373 strings (resource_id=332)
  • firefox|toolkit/toolkit/pdfviewer/viewer.ftl — 357 strings (resource_id=244)
  • firefox|dom/chrome/dom/dom.properties — 335 strings (resource_id=180)
  • firefox|toolkit/toolkit/neterror/nsserrors.ftl — 331 strings (resource_id=241)
project resource strings current (ms) aggregate (ms) speedup
firefox-for-ios firefox-ios.xliff 1700 24.0 2.8 8.5x
firefox-for-android mozilla-mobile/fenix/app/src/main/res/values/strings.xml 1680 24.0 2.8 8.5x
firefox browser/browser/preferences/preferences.ftl 1016 23.7 2.6 9.2x
firefox browser/browser/browser.ftl 514 24.1 2.4 9.8x
firefox devtools/client/netmonitor.properties 419 23.8 2.4 10.1x
firefox browser/browser/newtab/newtab.ftl 378 24.2 2.4 10.2x
firefox devtools/client/debugger.properties 373 23.7 2.4 10.1x
firefox toolkit/toolkit/pdfviewer/viewer.ftl 357 23.8 2.3 10.2x
firefox dom/chrome/dom/dom.properties 335 24.1 2.3 10.2x
firefox toolkit/toolkit/neterror/nsserrors.ftl 331 24.2 2.3 10.4x

Overall totals (20 runs each):

  • Current (5 queries): 4.855s, 24.27ms avg per call
  • Aggregate (1 query): 0.496s, 2.48ms avg per call
  • Overall speedup: 9.8x

Production

Top 10 resources:

  • sumo|LC_MESSAGES/django.po — 2611 strings (resource_id=564)
  • marketplace|LC_MESSAGES/django.po — 1810 strings (resource_id=2614)
  • firefox-for-ios|firefox-ios.xliff — 1700 strings (resource_id=580)
  • firefox-for-android|mozilla-mobile/fenix/app/src/main/res/values/strings.xml — 1680 strings (resource_id=3436)
  • amo|LC_MESSAGES/django.po — 1501 strings (resource_id=578)
  • seamonkey|suite/chatzilla/chrome/chatzilla.properties — 1154 strings (resource_id=4291)
  • firefox|browser/browser/preferences/preferences.ftl — 1016 strings (resource_id=3124)
  • mozilla-accounts|settings.ftl — 974 strings (resource_id=4198)
  • thunderbirdnet|LC_MESSAGES/messages.po — 883 strings (resource_id=3168)
  • amo-frontend|LC_MESSAGES/amo.po — 807 strings (resource_id=2790)
project resource strings current (ms) aggregate (ms) speedup
sumo LC_MESSAGES/django.po 2611 56.3 14.6 3.8x
marketplace LC_MESSAGES/django.po 1810 46.8 15.8 3.0x
firefox-for-ios firefox-ios.xliff 1700 49.1 13.3 3.7x
firefox-for-android mozilla-mobile/fenix/app/src/main/res/values/strings.xml 1680 45.5 11.8 3.8x
amo LC_MESSAGES/django.po 1501 40.8 13.6 3.0x
seamonkey suite/chatzilla/chrome/chatzilla.properties 1154 35.4 9.6 3.7x
firefox browser/browser/preferences/preferences.ftl 1016 186.0 7.0 26.7x
mozilla-accounts settings.ftl 974 36.7 9.5 3.9x
thunderbirdnet LC_MESSAGES/messages.po 883 33.4 9.4 3.6x
amo-frontend LC_MESSAGES/amo.po 807 34.9 11.9 2.9x

Overall totals (20 runs each):

  • Current (5 queries): 10.531s, 52.66ms avg per call
  • Aggregate (1 query): 2.238s, 11.19ms avg per call
  • Overall speedup: 4.7x

Delta vs calculate_stats()

Script: https://gist.github.com/flodolo/21e66cc03bc5e8ddcc8275db1375a26a

This benchmark was used to rule out calling calculate_stats() all the time as the solution.

Benchmark with 50 translations, 5 largest resources.

Local Docker install

  Old approach (get_stats x2 + adjust_stats):
    Avg queries : 8.0
    Avg time    : 2.25 ms

  New approach (calculate_stats):
    Avg queries : 5.0
    Avg time    : 3.98 ms

  Query reduction : 3.0 fewer queries per save (38%)
  Time change  : 1.73 ms per save (77%)

Production

  Old approach (get_stats x2 + adjust_stats):
    Avg queries : 8.2
    Avg time    : 7.32 ms

  New approach (calculate_stats):
    Avg queries : 9.1
    Avg time    : 49.07 ms

  Query reduction : -0.9 fewer queries per save (-11%)
  Time change  : 41.76 ms per save (570%)

@flodolo flodolo requested review from eemeli and mathjazz March 23, 2026 08:55
@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.83%. Comparing base (e04d98e) to head (6c68ea7).

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@flodolo flodolo changed the title Optimize and use calculate_stats() instead of delta when saving Optimize calculate_stats(), fall back on IntegrityError Mar 23, 2026
@flodolo flodolo changed the title Optimize calculate_stats(), fall back on IntegrityError Optimize stats() functions, fall back on IntegrityError Mar 23, 2026
@flodolo
Copy link
Collaborator Author

flodolo commented Mar 24, 2026

I ended up with a ton of code overlap between calculate_stats() and get_stats(), so extracted that in a helper function aggregate_translation_stats() (had to put it in a separate file to avoid circular dependencies).

Actually, I should be able to put it in translations after getting rid of one import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Approving translation causes IntegrityError

2 participants