Skip to content

Add getcpu system call support for getting a NUMA node of the current thread#1575

Open
Eugene-Usachev wants to merge 4 commits intobytecodealliance:mainfrom
Eugene-Usachev:getcpu
Open

Add getcpu system call support for getting a NUMA node of the current thread#1575
Eugene-Usachev wants to merge 4 commits intobytecodealliance:mainfrom
Eugene-Usachev:getcpu

Conversation

@Eugene-Usachev
Copy link

Not much time ago I needed to get a NUMA node of the current thread. Linux provides the getcpu system call for it. But I did not find any Rust library that provides this function. I read this repository code, and I like what you are doing, so I want to help you to extend the functionality of this library.

I implemented the getcpu syscall for Linux only. Also, I develop with WSL, which adds \r\n at the end of lines, so I patched tests for not reading the end of lines.

/// [Linux]: https://man7.org/linux/man-pages/man2/getcpu.2.html
#[cfg(linux_kernel)]
#[inline]
pub fn getcpu() -> (usize, usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Linux docs, the values written by getcpu have type unsigned int.

   int getcpu(unsigned int *_Nullable cpu, unsigned int *_Nullable node);

Would it be better to reflect them here as u32, rather than usize?

I see that sched_getcpu already returns usize, but that appears to be an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do the same as sched_getcpu. To be honest, I am not sure what users prefer more here. I use numa_node: usize in my code, but I can't say everyone does it. If you want me to change it I can do it but I think it is not important.

target_arch = "s390x",
))]
#[cold]
#[inline(never)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for inline(never) here? It's already marked #[cold], which should have the desired effect. It a compiler decides it really wants to inline this, even given what we've told it, that seems fine.

Copy link
Author

@Eugene-Usachev Eugene-Usachev Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still sure that non-inlining provides a better performance, but I agree with both points: the difference is tiny and the compiler can do it as it wants. But I want to explicitly tell the compiler to generate exactly one call method for the calling this function that should be called only once. If you don't like this, I can't roll back this change. After all, the main goal of the PR is adding getcpu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure, myself. In a compiler with hot/cold basic block partitioning, inlining and then moving the cold blocks away from the hot path achieves a similar result to not-inlining, except that the compiler can more easily use an effectively custom calling convention. I don't know how much it matters in the code in question here, but in general, I don't like prohibiting compilers from doing things unless I have specific reasons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants