-
Notifications
You must be signed in to change notification settings - Fork 6
graph generation speedups (three different ones) #681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cherrypick 73b6045 manually + 1f326fa and bits of ddc5a7c
b2d5fc7 use segment_name when no GraphListEntry system restrictionsOf the three ideas here, this one's been kicking around the longest, and has the most straightforward diff. Each graph edge is already constructed with a |
28be0f0 store+retrieve formatted vertex coord stringsThe last big round of graph generation speedups revealed a bottleneck in formatting numbers into text strings. Switching to the Every vertex's coordinates are put into at least 9 graph files. There are 2 dimensions to this:
How's that work out for redundancy in formatting the coordinate strings?
The thing to do is, in a separate (threaded!) pass before graph generation proper, format each string once and store it in a small buffer in the This commit has the greatest impact out of the three on my Linux machines. |
e76ad22 vertex nums cache locality & fmt::printThis is the silver bullet for FreeBSD performance!Like the commit message says, this one actually has 2 components:
Another thing I tried out was keeping an array of pre-formatting strings similar to 28be0f0, but that performed worse. Holy blap! FreeBSD performs better than CentOS now! |
These graph generation speedups finally open up the FreeBSD bottleneck, bringing its performance in line with Linux.
Noreaster is gonna love this.
What's going on under the hood is discussed in separate posts for each of the 3 commits below.
Here are the charts for how performance is affected by all 3 commits in this pull request combined.
The second image is the same thing on a logarithmic scale.
This can help more easily differentiate the before & after lines and the different machines, with everything less squished together in the lower left corner.
It also helps more easily visualize where the sweet spot is for each machine, before efficiency starts decreasing.
Benchmarks are performed using a RAM disk now.
For those who remember the #626 fiasco (which led to adopting {fmt} in the 1st place), this avoids both the inconsistently slow times that can result from writing to disk, and the falsely fast times recorded when writing to /dev/null.
Never thought I'd see this, but we're just shy of breaking the 1 second barrier.
For graph generation proper (not counting the now-separate "Formatting vertex coordinate strings" task), lab6 averages 1.0707 s @ 7 threads. Individual passes have taken as little as 1.0191 s (6 threads in that case).
I have a few more tweaks in the pipeline long-term that can get us there -- at least, as long as I use old enough HighwayData & UserData revisions. At some point the data will increase to the point that sub-1s graph generation will be permanently out of reach.
At some point returns will diminish and there will be no more efficiency to wring out of the process.