Add getcpu system call support for getting a NUMA node of the current thread#1575
Add getcpu system call support for getting a NUMA node of the current thread#1575Eugene-Usachev wants to merge 4 commits intobytecodealliance:mainfrom
getcpu system call support for getting a NUMA node of the current thread#1575Conversation
| /// [Linux]: https://man7.org/linux/man-pages/man2/getcpu.2.html | ||
| #[cfg(linux_kernel)] | ||
| #[inline] | ||
| pub fn getcpu() -> (usize, usize) { |
There was a problem hiding this comment.
According to the Linux docs, the values written by getcpu have type unsigned int.
int getcpu(unsigned int *_Nullable cpu, unsigned int *_Nullable node);
Would it be better to reflect them here as u32, rather than usize?
I see that sched_getcpu already returns usize, but that appears to be an error.
There was a problem hiding this comment.
I tried to do the same as sched_getcpu. To be honest, I am not sure what users prefer more here. I use numa_node: usize in my code, but I can't say everyone does it. If you want me to change it I can do it but I think it is not important.
| target_arch = "s390x", | ||
| ))] | ||
| #[cold] | ||
| #[inline(never)] |
There was a problem hiding this comment.
Is there a reason for inline(never) here? It's already marked #[cold], which should have the desired effect. It a compiler decides it really wants to inline this, even given what we've told it, that seems fine.
There was a problem hiding this comment.
I am still sure that non-inlining provides a better performance, but I agree with both points: the difference is tiny and the compiler can do it as it wants. But I want to explicitly tell the compiler to generate exactly one call method for the calling this function that should be called only once. If you don't like this, I can't roll back this change. After all, the main goal of the PR is adding getcpu.
There was a problem hiding this comment.
I'm not so sure, myself. In a compiler with hot/cold basic block partitioning, inlining and then moving the cold blocks away from the hot path achieves a similar result to not-inlining, except that the compiler can more easily use an effectively custom calling convention. I don't know how much it matters in the code in question here, but in general, I don't like prohibiting compilers from doing things unless I have specific reasons.
There was a problem hiding this comment.
I have removed this line.
Not much time ago I needed to get a NUMA node of the current thread. Linux provides the
getcpusystem call for it. But I did not find any Rust library that provides this function. I read this repository code, and I like what you are doing, so I want to help you to extend the functionality of this library.I implemented the
getcpusyscall for Linux only. Also, I develop with WSL, which adds \r\n at the end of lines, so I patched tests for not reading the end of lines.