perf: welford's algorithm for mean-var aggregation#4147
Conversation
| out[cat, col] += data.data[j] | ||
|
|
||
|
|
||
| @njit |
There was a problem hiding this comment.
We should make these nogil or provide an option for fau to provide nogil njit
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4147 +/- ##
==========================================
- Coverage 79.68% 79.65% -0.04%
==========================================
Files 120 120
Lines 12801 12795 -6
==========================================
- Hits 10201 10192 -9
- Misses 2600 2603 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
@ilan-gold I reviewed since this plays into our other work
I don't see any issues with the welfords implementation, lgtm!
and overall I'm new with njit, OMP, TBB. But I see your point about nogil, and was able to create some scenarios that trigger that path. So agreed on the suggestion... I also found that it does also slightly improve the chan njit work when fau falls back on the serial build, and I guess it would generally fix any of these types of things across scanpy, so maybe that does belong in fau?
There's probably some threading stuff I'm not fully grasping with scanpy, fau, those libraries yet though; I will be more confident in that in a few weeks. I figure I will revisit threading and numba as a whole as part of building a good understanding of these libraries, and if I find any thread issues throughout scanpy/fau I will raise them at that point.
Yes the issue with
That would be amazing because it is something we have (clearly) struggled with |
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/39f12414fea9cea9439a3d9f665d1e17636092a9..25e6bfcdbb14be0ca79a555e0939fe81dfe34bdb More details: https://github.com/scverse/scanpy/pull/4147/checks?check_run_id=82708559802 |
flying-sheep
left a comment
There was a problem hiding this comment.
Was bd85e03 supposed to be a response to the benchmark results? because it didn’t change them.
Looks great, but maybe the test can be made slightly more clear.
It was certainly an attempt! The number did go down a bit, but not much |
…n-var aggregation) (#4176) Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
From discussions with @zboldyga.
This should then in theory be reused with #4143 instead of its custom moments calculation