Skip to content

Skip remaining PCI devices when device 0 is absent on a bus#2261

Open
jbreitbart wants to merge 1 commit intohermit-os:mainfrom
jbreitbart:pci-skip-empty-bus
Open

Skip remaining PCI devices when device 0 is absent on a bus#2261
jbreitbart wants to merge 1 commit intohermit-os:mainfrom
jbreitbart:pci-skip-empty-bus

Conversation

@jbreitbart
Copy link
Contributor

Summary

  • Apply standard PCI enumeration optimization: if device 0 function 0 on a bus returns vendor ID 0xFFFF, skip the remaining device slots on that bus
  • Reduces PCI legacy scan from 32 buses × 32 devices = 1024 probes to ~63 probes on uhyve (which has a single device at bus 0 device 0)

Performance

Measured with RDTSC-based boot instrumentation on uhyve (3.7 GHz TSC):

  • PCI legacy scan: 13,237 us → 797 us (16.6x faster)
  • Total boot to application: ~17,900 us → ~5,500 us (3.3x faster)

Test plan

  • Built and booted hello_world example via uhyve, device at 00:00 still discovered correctly

🤖 Generated with Claude Code

Standard PCI enumeration optimization: if device 0 function 0 on a bus
returns 0xFFFF, no devices exist on that bus. This reduces the scan from
32 buses × 32 devices = 1024 probes to ~63 probes on uhyve (which has
a single device at bus 0 device 0).

PCI legacy scan drops from ~13 ms to ~0.8 ms, cutting total boot time
from ~18 ms to ~5.5 ms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: 596d628 Previous: dd4adf7 Performance Ratio
startup_benchmark Build Time 99.42 s 100.04 s 0.99
startup_benchmark File Size 0.86 MB 0.86 MB 1.00
Startup Time - 1 core 0.87 s (±0.03 s) 0.97 s (±0.03 s) 0.90
Startup Time - 2 cores 0.90 s (±0.03 s) 0.97 s (±0.03 s) 0.93
Startup Time - 4 cores 0.89 s (±0.03 s) 0.96 s (±0.04 s) 0.93
multithreaded_benchmark Build Time 102.32 s 101.89 s 1.00
multithreaded_benchmark File Size 0.96 MB 0.96 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 90.72 % (±9.15 %) 91.00 % (±9.06 %) 1.00
Multithreaded Pi Efficiency - 4 Threads 44.55 % (±4.15 %) 43.85 % (±3.35 %) 1.02
Multithreaded Pi Efficiency - 8 Threads 26.26 % (±2.46 %) 25.74 % (±2.15 %) 1.02
micro_benchmarks Build Time 111.89 s 113.74 s 0.98
micro_benchmarks File Size 0.96 MB 0.96 MB 1.00
Scheduling time - 1 thread 71.27 ticks (±3.18 ticks) 70.24 ticks (±3.59 ticks) 1.01
Scheduling time - 2 threads 39.86 ticks (±5.29 ticks) 39.12 ticks (±4.49 ticks) 1.02
Micro - Time for syscall (getpid) 2.97 ticks (±0.20 ticks) 2.99 ticks (±0.29 ticks) 0.99
Memcpy speed - (built_in) block size 4096 63635.92 MByte/s (±45158.67 MByte/s) 65087.76 MByte/s (±46625.38 MByte/s) 0.98
Memcpy speed - (built_in) block size 1048576 30045.91 MByte/s (±25007.78 MByte/s) 29936.81 MByte/s (±24889.06 MByte/s) 1.00
Memcpy speed - (built_in) block size 16777216 25077.57 MByte/s (±20994.58 MByte/s) 25527.51 MByte/s (±21459.57 MByte/s) 0.98
Memset speed - (built_in) block size 4096 64370.40 MByte/s (±45726.54 MByte/s) 65536.56 MByte/s (±46897.60 MByte/s) 0.98
Memset speed - (built_in) block size 1048576 30813.46 MByte/s (±25447.25 MByte/s) 30723.73 MByte/s (±25327.45 MByte/s) 1.00
Memset speed - (built_in) block size 16777216 25807.48 MByte/s (±21447.22 MByte/s) 26327.86 MByte/s (±21978.97 MByte/s) 0.98
Memcpy speed - (rust) block size 4096 57979.32 MByte/s (±42784.34 MByte/s) 57856.95 MByte/s (±42652.78 MByte/s) 1.00
Memcpy speed - (rust) block size 1048576 30065.26 MByte/s (±25142.63 MByte/s) 30116.15 MByte/s (±25100.80 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 24377.68 MByte/s (±20550.82 MByte/s) 24868.32 MByte/s (±20982.14 MByte/s) 0.98
Memset speed - (rust) block size 4096 58833.97 MByte/s (±43243.81 MByte/s) 58845.68 MByte/s (±43409.33 MByte/s) 1.00
Memset speed - (rust) block size 1048576 30866.86 MByte/s (±25573.40 MByte/s) 30915.72 MByte/s (±25554.06 MByte/s) 1.00
Memset speed - (rust) block size 16777216 25161.37 MByte/s (±21071.74 MByte/s) 25661.65 MByte/s (±21502.61 MByte/s) 0.98
alloc_benchmarks Build Time 103.92 s 106.25 s 0.98
alloc_benchmarks File Size 0.93 MB 0.93 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 7698.02 Ticks (±309.70 Ticks) 13150.35 Ticks (±161.03 Ticks) 0.59
Allocations - Average Allocation time (no fail) 7698.02 Ticks (±309.70 Ticks) 13150.35 Ticks (±161.03 Ticks) 0.59
Allocations - Average Deallocation time 2230.97 Ticks (±605.34 Ticks) 1127.38 Ticks (±770.43 Ticks) 1.98
mutex_benchmark Build Time 104.88 s 104.47 s 1.00
mutex_benchmark File Size 0.96 MB 0.96 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 13.10 ns (±0.67 ns) 13.02 ns (±0.58 ns) 1.01
Mutex Stress Test Average Time per Iteration - 2 Threads 15.84 ns (±0.78 ns) 15.66 ns (±0.86 ns) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants