Skip to content

[RVV] Add missing maxpool and avgpool rvv kernels #9622

Merged
copybara-service[bot] merged 4 commits intogoogle:masterfrom
ken-unger:maxpool-rvv
Apr 17, 2026
Merged

[RVV] Add missing maxpool and avgpool rvv kernels #9622
copybara-service[bot] merged 4 commits intogoogle:masterfrom
ken-unger:maxpool-rvv

Conversation

@ken-unger
Copy link
Copy Markdown
Contributor

  • Add rvv script to generate f32-avgpool and f16-avgpool rvv kernels
  • Rewrite rvv script to generate f32-maxpool and add f16-maxpool, s8-maxpool, u8-maxpool rvv kernels
  • Script now closely follows the simd version.

Relevant tests and benchmarks executed and pass (BPI-F3).

@ken-unger
Copy link
Copy Markdown
Contributor Author

RE: rvv hardware detection support for fp16

@dsharlet just fyi, I've opened a PR in pytorch/cpuinfo for this purpose. Once that is reviewed and merged I can make the needed changes to hardware-config.c. pytorch/cpuinfo#375

While other changes in cpuinfo are desirable (e.g cache info for riscv uarchs) I tried to keep the changes to a minimum this round in the hope that the review goes smoothly.

@ken-unger
Copy link
Copy Markdown
Contributor Author

@fbarchard please also give this PR a lookover when you have a few minutes. Thank you.

@ken-unger
Copy link
Copy Markdown
Contributor Author

@fbarchard could you review when you have a few free minutes. Thank you.

@ken-unger
Copy link
Copy Markdown
Contributor Author

ken-unger commented Apr 14, 2026

@dsharletg I've updated this PR to latest master and retested. Please review and merge when you are able. Thank you!

With this, subgraph-mobilenet-bench now runs with the FP16 datatype.

2026-04-13T14:19:40-07:00
Running subgraph-mobilenet-bench
Run on (8 X 1600 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x8)
  L1 Data 32 KiB (x8)
  L2 Unified 512 KiB (x2)
Load Average: 2.63, 2.47, 1.48
***WARNING*** ASLR is enabled, the results may have unreproducible noise in them.
------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------
FP32MobileNetV1/process_time/real_time          186104 us       186098 us            4 cpufreq=1.6G
FP32MobileNetV2/process_time/real_time          185933 us       185941 us            4 cpufreq=1.6G
FP32MobileNetV3Large/process_time/real_time     224285 us       224279 us            3 cpufreq=1.6G
FP32MobileNetV3Small/process_time/real_time      72538 us        72547 us           10 cpufreq=1.6G
FP16MobileNetV1/process_time/real_time          112145 us       112153 us            7 cpufreq=1.6G
FP16MobileNetV2/process_time/real_time           91548 us        91556 us            8 cpufreq=1.6G
FP16MobileNetV3Large/process_time/real_time      75921 us        75930 us            9 cpufreq=1.6G
FP16MobileNetV3Small/process_time/real_time      25811 us        25821 us           27 cpufreq=1.6G
QS8MobileNetV2/process_time/real_time            54616 us        54625 us           13 cpufreq=1.6G

@ken-unger
Copy link
Copy Markdown
Contributor Author

@dsharletg thank you for all the previous PR approvals and merges. A very friendly ping on this one. I think we (or at least I) are getting very close to done.

copybara-service bot pushed a commit that referenced this pull request Apr 17, 2026
--
7596bbc by Ken Unger <ken.j.unger@gmail.com>:

rvv maxpool for f32, f16, s8, u8

--
afaabc2 by Ken Unger <ken.j.unger@gmail.com>:

cleanup rvv maxpool, add rvv avgpool

FUTURE_COPYBARA_INTEGRATE_REVIEW=#9622 from ken-unger:maxpool-rvv e1f759c
PiperOrigin-RevId: 901419799
@copybara-service copybara-service bot merged commit dd5ce30 into google:master Apr 17, 2026
21 checks passed
@ken-unger ken-unger deleted the maxpool-rvv branch April 20, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants