How does one do multithreading with this library? #7272
Unanswered
jonasdedden
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey!
We've been enjoying using this crate and built our own Rust-Python Parquet tooling with PyO3 to circumvent the numerous and massive memory leak issues we've encountered using
pyarrow. So far it's been a blast!One thing that strucks us though is that most of the methods that
pyarrowuses are actually heavily multithreaded in its respective C++ code, at least it seems to use more than one core and is actually faster than our own Rust tooling.Is there some easy way to enable multithreading with this crate? Or does one have to implement everything on their own?
How are trivial methods even multithreaded in
pyarrow? For example, just reading a RowGroup as aRecordBatchseems to use many cores. Is there one thread per RowGroup column and then, the data is "merged" somehow afterwards, or how is this achieved exactly?Thanks!
Beta Was this translation helpful? Give feedback.
All reactions