Introduce per geometry and overall limits on number of expire tiles #2449
+442
−238
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The number of tiles to be expired can be quite large if the input geometries are large or if there are many geometries. Numbers of tiles in the billions can crash osm2pgsql because it runs out of memory. Such large numbers can also overwhelm any kind of re-rendering mechanism run after osm2pgsql to bring tiles up to date. In day-to-day processing this should not happen, but it can happen due to vandalism or misconfiguration.
To protect against this problem, this change introduces limits on the number of tiles that can be affected by a single geometry and the overall number of tiles that an expire output will generate for each run of osm2pgsql.
max_tiles_geometrythis geometry will be ignored for the purposes of expiry. Note that the geometry will still be written to the database, but no tiles will be added to the expire output.max_tiles_overall, no further tiles will be written to this output.Limits are per expire output of which you can have several. The limits can be set in the flex expire output configuration but sensible defaults are provided. For the (legacy) expire output configured on the command line with the
-eand-ooptions, the settings can not be changed, you will always get the default values.To choose the default values for these settings I looked at real-world values as follows:
max_tiles_geometry.max_tiles_overall: Paul Norman analyzed the number of tiles expired by typical minutely updates in https://www.openstreetmap.org/user/pnorman/diary/403266. For zoom level 14 the most he got was 119801 tiles. The same analysis also shows that for longer time frames (checked were 2 minutes and 5 minutes, but the same should be true for larger intervals) the number of tiles doesn't go up because these huge numbers only happen very rarely.Rounding these numbers and adding a safety factor, values of 10,000,000 and 50,000,000 seem reasonable for the single geometry and the overall number of tiles per run. Memory use in osm2pgsql is about 32 bytes per tile, so this will need 1.6 GB max which should be no problem at all.
The numbers are chosen so they will practically never be triggered so that users upgrading from existing versions of osm2pgsql will not be suddenly affected. It is recommended that users tune their settings according to their own needs. Once we have some more operational experience with this, we can adjust the defaults.
I considered using different default max values for different zoom levels, but this will make configuration more complicated.
Change file processing in osm2pgsql runs in parallel threads. The old code stored the to-be-expired tiles in one list per thread and merged them later. This has two problems:
a) because the lists might contain some of the same tiles, all lists
together can use a much larger amount than a single list would take
b) we can not easily check the number of tiles in those lists against
the configured maximum.
So this commit changes the way the list is kept: We only keep a single list in the expire_output_t and use a mutex to control access to this list. (There might still be overlapping lists if you have more than one expire output, but that's by design.)
Objects of expire_tiles_t class now only keep a temporary list for each geometry added. Once all tiles affected by a single geometry are identified, this list is added to the overall list in expire_output_t and the temporary list is cleared.
Fixes #2190