Skip to content

refactor(mmio): deduplicate code and avoid allocation#2319

Open
cagatay-y wants to merge 3 commits intohermit-os:mainfrom
cagatay-y:refactor-mmio
Open

refactor(mmio): deduplicate code and avoid allocation#2319
cagatay-y wants to merge 3 commits intohermit-os:mainfrom
cagatay-y:refactor-mmio

Conversation

@cagatay-y
Copy link
Copy Markdown
Contributor

@cagatay-y cagatay-y commented Mar 12, 2026

The third commit (e8a3932) in the series is meant as a demo for if enum_dispatch would work. The answer seems to be "no" as it requires a nightly feature and some trait boilerplate because of the lack of support for associated types in #[enum_dispatch(...)].

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 6faef5b Previous: 38bd7fd Performance Ratio
startup_benchmark Build Time 92.33 s 90.07 s 1.03
startup_benchmark File Size 0.80 MB 0.80 MB 1.00
Startup Time - 1 core 0.88 s (±0.03 s) 0.79 s (±0.02 s) 1.11
Startup Time - 2 cores 0.92 s (±0.02 s) 0.80 s (±0.03 s) 1.14
Startup Time - 4 cores 0.91 s (±0.02 s) 0.80 s (±0.03 s) 1.15
multithreaded_benchmark Build Time 92.89 s 86.68 s 1.07
multithreaded_benchmark File Size 0.90 MB 0.90 MB 1
Multithreaded Pi Efficiency - 2 Threads 89.83 % (±6.98 %) 91.22 % (±9.86 %) 0.98
Multithreaded Pi Efficiency - 4 Threads 44.42 % (±2.37 %) 45.35 % (±3.73 %) 0.98
Multithreaded Pi Efficiency - 8 Threads 25.55 % (±1.81 %) 26.62 % (±1.86 %) 0.96
micro_benchmarks Build Time 94.41 s 93.40 s 1.01
micro_benchmarks File Size 0.91 MB 0.91 MB 1.00
Scheduling time - 1 thread 71.04 ticks (±4.65 ticks) 69.35 ticks (±3.92 ticks) 1.02
Scheduling time - 2 threads 40.01 ticks (±6.46 ticks) 41.25 ticks (±6.01 ticks) 0.97
Micro - Time for syscall (getpid) 3.04 ticks (±0.26 ticks) 2.92 ticks (±0.26 ticks) 1.04
Memcpy speed - (built_in) block size 4096 76374.85 MByte/s (±53015.48 MByte/s) 84400.50 MByte/s (±58168.02 MByte/s) 0.90
Memcpy speed - (built_in) block size 1048576 29646.69 MByte/s (±24227.80 MByte/s) 30139.92 MByte/s (±24548.52 MByte/s) 0.98
Memcpy speed - (built_in) block size 16777216 28347.90 MByte/s (±23434.85 MByte/s) 28400.11 MByte/s (±23402.48 MByte/s) 1.00
Memset speed - (built_in) block size 4096 76445.44 MByte/s (±53071.65 MByte/s) 84104.52 MByte/s (±57975.21 MByte/s) 0.91
Memset speed - (built_in) block size 1048576 30392.63 MByte/s (±24651.51 MByte/s) 30890.47 MByte/s (±24972.05 MByte/s) 0.98
Memset speed - (built_in) block size 16777216 29099.16 MByte/s (±23861.98 MByte/s) 29155.75 MByte/s (±23839.80 MByte/s) 1.00
Memcpy speed - (rust) block size 4096 69663.16 MByte/s (±48797.21 MByte/s) 72077.42 MByte/s (±50514.07 MByte/s) 0.97
Memcpy speed - (rust) block size 1048576 29706.30 MByte/s (±24323.26 MByte/s) 29863.39 MByte/s (±24400.53 MByte/s) 0.99
Memcpy speed - (rust) block size 16777216 28353.70 MByte/s (±23440.57 MByte/s) 28417.05 MByte/s (±23425.11 MByte/s) 1.00
Memset speed - (rust) block size 4096 69862.80 MByte/s (±48936.88 MByte/s) 72456.37 MByte/s (±50752.41 MByte/s) 0.96
Memset speed - (rust) block size 1048576 30477.20 MByte/s (±24767.28 MByte/s) 30608.68 MByte/s (±24825.48 MByte/s) 1.00
Memset speed - (rust) block size 16777216 29115.01 MByte/s (±23877.56 MByte/s) 29161.51 MByte/s (±23851.40 MByte/s) 1.00
alloc_benchmarks Build Time 94.16 s 91.85 s 1.03
alloc_benchmarks File Size 0.87 MB 0.87 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 6472.46 Ticks (±83.35 Ticks) 7519.73 Ticks (±89.71 Ticks) 0.86
Allocations - Average Allocation time (no fail) 6472.46 Ticks (±83.35 Ticks) 7519.73 Ticks (±89.71 Ticks) 0.86
Allocations - Average Deallocation time 1786.85 Ticks (±192.47 Ticks) 1664.29 Ticks (±120.38 Ticks) 1.07
mutex_benchmark Build Time 93.22 s 90.87 s 1.03
mutex_benchmark File Size 0.91 MB 0.91 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 13.44 ns (±0.88 ns) 12.94 ns (±0.70 ns) 1.04
Mutex Stress Test Average Time per Iteration - 2 Threads 18.36 ns (±4.27 ns) 15.72 ns (±10.39 ns) 1.17

This comment was automatically generated by workflow using github-action-benchmark.

@cagatay-y cagatay-y force-pushed the refactor-mmio branch 2 times, most recently from bf1ad94 to 0c4b853 Compare March 12, 2026 13:09
@cagatay-y cagatay-y force-pushed the refactor-mmio branch 2 times, most recently from 4292def to c0e3718 Compare April 1, 2026 09:09
Copy link
Copy Markdown
Member

@mkroening mkroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good! :)

Do you think it would make sense to squash commit 2 and 3 or do you think introducing dyn in commit 2 is a useful step towards commit 3?

Lastly, there is currently a bug in this code. This PR does not worsen it, but it touches the code. I wondered whether knowledge of the issue would change the structure in this PR. The issue is that when using multiple virtio devices, they currently reuse the same virtual memory address. See #2255 for details. To reproduce, run:

cargo xtask ci rs --arch x86_64 --package httpd --no-default-features --features ci,hermit/dhcpv4,hermit/tcp,hermit/virtio-net,hermit/virtio-console qemu --microvm --devices virtio-console-mmio --devices virtio-net-mmio

Swapping the device order changes the result.

@mkroening mkroening self-assigned this Apr 11, 2026
When multiple virtio-mmio devices are present, QEMU assigns them IRQs
and addresses based on their index. Use that index to search and
initialize devices.

This also fixes the previous erroneous behavior of providing the IRQ of
the first device (QEMU fills the devices in descending order, so the
first device has the IRQ value of VIRTIO_IRQ_BASE +
VIRTIO_NUM_TRANSPORTS - 1 = 12, our previous constant) and allows the
latter devices to receive interrupts.
@cagatay-y
Copy link
Copy Markdown
Contributor Author

cagatay-y commented Apr 14, 2026

Do you think it would make sense to squash commit 2 and 3 or do you think introducing dyn in commit 2 is a useful step towards commit 3?

They were separate to ease the review but they are squashed now.

Lastly, there is currently a bug in this code. This PR does not worsen it, but it touches the code. I wondered whether knowledge of the issue would change the structure in this PR. The issue is that when using multiple virtio devices, they currently reuse the same virtual memory address. See #2255 for details. To reproduce, run:

cargo xtask ci rs --arch x86_64 --package httpd --no-default-features --features ci,hermit/dhcpv4,hermit/tcp,hermit/virtio-net,hermit/virtio-console qemu --microvm --devices virtio-console-mmio --devices virtio-net-mmio

It seems like the issue with this particular case was something other than the one in #2255 and it is fixed with 0d156de now. Returning the page to the free list is still incorrect but the later mappings do not overwrite the mapping for the previous devices in the guess_device path, since QEMU places all of the devices on the same frame in our case. I would add a fix for that issue too but I feel like it would be better to have a reproduction example. When I extract the kernel args that QEMU would provide and pass them manually in an attempt to reproduce the issue, they are still on the same frame and also they do not work at all since they are not the size assumed by the check_linux_args function. (EDIT: I just realized that check_linux_args is meant for Firecracker. I will test with it and report back, but the next paragraph stands.)

I don't think the fix will significantly change the structure of what is in this PR, so I think we can merge this one independently.

@cagatay-y cagatay-y requested a review from mkroening April 14, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants