perf: improve parallelism of SortBotMerge by jodavies · Pull Request #831 · form-dev/form

jodavies · 2026-05-18T18:23:16Z

This change comes from discussion and benchmarking efforts in collaboration with @AnaIPereira. See the commit message for a full description.

On my 12-core 7900X, the results are as follows. @AnaIPereira is finishing up some testing on higher core-count sytems, to make sure we have a good value for MINWRITENUMBEROFTERMS and that the adjusted, lower value of MINIMUMNUMBEROFTERMS does not affect performance.

Speedup w.r.t. master:

Benchmark	form	-w2	-w4	-w6	-w12	-w24 (HT)
`chromatic`	0.99 ± 0.01	1.03 ± 0.02	1.08 ± 0.02	1.14 ± 0.01	1.12 ± 0.01	1.08 ± 0.00
`color`	0.99 ± 0.01	1.01 ± 0.01	1.02 ± 0.01	1.03 ± 0.02	1.03 ± 0.02	1.03 ± 0.01
`fmft`	1.01 ± 0.01	1.02 ± 0.00	1.08 ± 0.01	1.13 ± 0.01	1.14 ± 0.01	1.09 ± 0.01
`forcer-exp`	1.00 ± 0.00	1.00 ± 0.00	1.04 ± 0.01	1.09 ± 0.03	1.13 ± 0.00	1.14 ± 0.00
`forcer`	1.00 ± 0.01	1.01 ± 0.00	1.07 ± 0.01	1.14 ± 0.00	1.20 ± 0.01	1.21 ± 0.00
`hyperform`	1.00 ± 0.01	1.03 ± 0.01	1.06 ± 0.01	1.09 ± 0.01	1.08 ± 0.01	1.05 ± 0.01
`mass-fact`	1.00 ± 0.01	1.02 ± 0.02	1.01 ± 0.03	1.02 ± 0.02	1.06 ± 0.07	1.03 ± 0.05
`mbox1l`	1.00 ± 0.00	1.01 ± 0.00	1.07 ± 0.01	1.15 ± 0.02	1.30 ± 0.03	1.38 ± 0.04
`minceex`	1.00 ± 0.00	1.04 ± 0.00	1.06 ± 0.01	1.08 ± 0.01	1.08 ± 0.01	1.06 ± 0.01
`mincer`	1.01 ± 0.02	1.01 ± 0.01	1.03 ± 0.01	1.03 ± 0.01	1.02 ± 0.01	1.02 ± 0.01
`mzv-dm`	1.00 ± 0.01	0.99 ± 0.02	1.00 ± 0.01	1.00 ± 0.02	1.01 ± 0.04	1.00 ± 0.02
`sort-disk`	1.02 ± 0.01	1.01 ± 0.02	1.01 ± 0.01	1.01 ± 0.01	1.02 ± 0.02	1.00 ± 0.01
`sort-large`	1.02 ± 0.01	0.98 ± 0.01	0.99 ± 0.01	1.00 ± 0.02	1.01 ± 0.01	1.01 ± 0.01
`sort-small`	1.02 ± 0.02	0.98 ± 0.02	1.00 ± 0.02	1.02 ± 0.03	1.07 ± 0.04	1.07 ± 0.08
`trace`	1.00 ± 0.02	0.99 ± 0.01	0.99 ± 0.01	1.00 ± 0.01	1.00 ± 0.01	0.99 ± 0.03

coveralls · 2026-05-18T21:34:38Z

coverage: 61.506% (+0.07%) from 61.438% — jodavies:sortbot-blocks into form-dev:master

Currently, the sortbot stage of tform sorting does not achieve good parallelism. Primarily, this is because the "sortblock" size has grown with default SmallSize and LargeSize adjustments, such that in many cases the sortbot levels do not run in parallel at all, because each thread's output fits in a single block. This commit adjusts the logic for filling and unlocking the blocks such that sortbot threads can start work as soon as possible: - Only put complete terms into the blocks, no splitting over the blocks. - Track the number of terms in each block, and use this when reading the data to determine when a block is complete, rather than waiting for a term to overlap the "stop" pointer. - When filling blocks (in PutToMaster), if a certain (small) minimum number of terms has been written, probe if a reading thread is waiting on the current block by attempting to lock the previous block. If a thread is waiting (so the lock fails), unlock the current block immediately and write the term into the next. Also reduce the MINIMUMNUMBEROFTERMS parameter from 10 to 1, so that the small+large buffer does not scale (so much) with large MaxTermSize and many threads, and similarly reduce NUMBEROFBLOCKSINSORT to its minimal value of 8.

jodavies · 2026-05-21T08:13:09Z

I have removed the second commit, which get rid of "block 0": there is another circumstance in which it is required, separate from copying a partial term from the last block, such that it is contiguous with the tail in block 1, which is when adding terms with polyratfun. When terms merge, if the result is larger, it is copied into the space before "term1" in the merge: this requires block 0 if term1 is the first term of block 1. I added commentary to the code to highlight this.

jodavies force-pushed the sortbot-blocks branch from 6ef8662 to 5c7c957 Compare May 18, 2026 21:16

jodavies mentioned this pull request May 19, 2026

wip: reduce size of thread buckets at larger maxtermsize #829

Draft

jodavies force-pushed the sortbot-blocks branch from 5c7c957 to 541e928 Compare May 19, 2026 21:08

jodavies force-pushed the sortbot-blocks branch from 541e928 to 3e45fac Compare May 21, 2026 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve parallelism of SortBotMerge#831

perf: improve parallelism of SortBotMerge#831
jodavies wants to merge 1 commit into
form-dev:masterfrom
jodavies:sortbot-blocks

jodavies commented May 18, 2026 •

edited

Loading

Uh oh!

coveralls commented May 18, 2026 •

edited

Loading

Uh oh!

jodavies commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jodavies commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jodavies commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jodavies commented May 18, 2026 •

edited

Loading

coveralls commented May 18, 2026 •

edited

Loading