[EXTENDED] PR 26 Parallelized multiple right-hand side support, fully typed tools#27
Conversation
… test that test more different cases
…proved import chain; improved code readability
… cover the edge cases
… f-string; fixed typo
…hon level; fixed typos
…utput and warning behaviour; tested it altogether
…tate development without compromising on build
|
The last commit currently does not change the internal logic at all. The idea was taken from LAPACK. It returns an |
…ers I and II: - now, they share one common processing logic that - calls the solvers it needs depending on which solver was called Besides, a shape check was introduced and the `info`-variable was extended. All of this is now reflected in `core`, where the solvers and errors are not structured a bit better.
… auxiliary functions in `core`; added NumPy-solver error case test
|
To be able to cope with #28 and #29 in the future without compromising anything on the solvers side, I rearranged the Cython core (not the interface) as follows:
This is probably a bit too abstract (I barely know how phrase it), so please let me provide the following As of now, this PR basically implements the |
|
@MuellerSeb Sorry for all these iterations. It tooks me some time to get the right strategy for Cython, the GIL, Memory Allocation, the Parallelization, and a testing framework (they all pass still ✅). |
…trix Reason: it is the main diagonal of the matrix L for A = LU just like ps is the main diagonal of U for PTRANS-II. So now, the main diagonals of the non unit triangular factors are always in the central column which makes the most sense.
MuellerSeb
left a comment
There was a problem hiding this comment.
Gave it a first view. Was not to much in depth, since I got no time at this point as mentioned. I really like your work and it seems to be solid and improves the package.
- made OpenMP-link in `setup.py` optional based on an environment variable - changed thread number evaluation to be OpenMP-based (if at all) - dropped `psutil` dependency - renamed `workers` to `num_threads` and made `None` a possible option and the default - adapted tests to the new `num_threads` - updated documentation - adapted to `pylint` comments
|
@MothNik sorry for a late reply here. I have to admit that this is awesome but totally overwhelming. The last year was packed with private events and the whole stack of pentapy->anaflow->welltestpy really turned into a lowlevel side project where I have next to no time for feature development. At this point I got the feeling, that I am/was blocking your work and I feel sorry to not support your enthusiasm extending this little package. Since it is almost a year with no progress, I guess you already moved on and lost interest in pushing this forward, right? If not, I am a bit puzzled how to move forward, since my time schedule wont change anytime soon. Maybe we could have a short chat at some point. |
|
@MuellerSeb no worries! You communicated back then that you would not have a lot of time. In the meantime, I indeed moved on and released Long story short: don't beat yourself up! I'm here if you need assistance. |
|
Hey great! So maybe we could add pybandee as a new backend in the future? |
Update June 10, 2024
I temporarily converted this to a draft because the decisions for #28 heavily affect this PR because this might involve another change in the Cython-level API.
Changes
This pull request is an improved version of #26, so please refer to the basic changes in there. On top of these changes, this pull request
tools-modulesolver.pyxdso that the solvers only accept C-contiguous Arrays. This will not be backwards compatible because it could already be that users pass Fortran-contiguous Arrays (maybe without even knowing).setup.pyto include OpenMPworker-argument to the Python interfaceSince all that is a breaking change and the Cython interface is not backwards-compatible, I suggest that this is a major version jump from 1 to 2 (maybe as an$\alpha$ -version of it?).
Again, I tried to update the changelog, but it might be that you need to still look over it.
Tests
The installation and parallelisation were tested on Windows 11 and Ubuntu (WSL). ✅
Tests now cover serial and parallelel solves, but they have to run with
pytest --cov=pentapy .\tests --cov-report html -xnow to prevent interfering with the multithreading. ✅
⚠️ Note that the ⚠️
-xcancels the process for the first failure which is mandatory for this delicate topic where we deal with pointersDepending on the OS, it might be that the doctests of
tests/util_funcsfail because the Array output includes thedtypeon Windows while it does not on Linux. ❌Given that the C-implementation is already very fast in serial, the parallelization does not cause a quantum leap. With 8 threads on a relatively old laptop (there it is, the grain of salt 🧂), I observe a threefold speedup for huge systems (1,000 x 1,000 with 10,000 right-hand sides):
On massively parallel systems, this will give better speedups.
However, it does not negatively affect the serial solves when
workers=1: