Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772
Draft
dgarske wants to merge 6 commits into
Draft
Intel QuickAssist: multi-device utilization + software-fallback / Cavium fixes#10772dgarske wants to merge 6 commits into
dgarske wants to merge 6 commits into
Conversation
…udo-free docs; Cavium async req_count OOB fix
…xes software-fallback RSA/cert-verify -142/-140/-173)
… rings, hugepages, AER, heartbeat)
…NDING_E gap; add QAT_NO_DEV_INTERLEAVE)
…drop manual -j1 make check docs
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves hardware-accelerated crypto offload behavior for Intel QuickAssist (QAT) and Cavium/Nitrox, focusing on better multi-device utilization, more robust software fallback when QAT isn’t available, and a Nitrox polling safety fix.
Changes:
- Reorders QAT crypto instances to interleave across devices by default (opt-out via
QAT_NO_DEV_INTERLEAVE) to improve utilization at lower thread counts. - Fixes software-fallback behavior in the QAT NUMA allocator when the QAT service isn’t started, allowing crypto to proceed in software.
- Fixes a Cavium/Nitrox multi-request polling OOB condition by resetting
req_countafter buffer flush; also corrects an RSA public free heap parameter and expands Intel QAT documentation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| wolfssl/wolfcrypt/port/intel/quickassist_mem.h | Adds an internal “is QAT started” query used by the QAT memory layer to decide when to fall back to regular memory. |
| wolfcrypt/src/port/intel/README.md | Updates Intel QAT usage docs (non-sudo operation, serialized testing guidance, multi-device benchmarking, diagnostics). |
| wolfcrypt/src/port/intel/quickassist.c | Adds IntelQaIsStarted() and instance interleaving logic; fixes RSA public free heap usage. |
| wolfcrypt/src/port/intel/quickassist_mem.c | Adds fallback to regular malloc when NUMA allocation fails and QAT service is not started. |
| wolfcrypt/src/async.c | Resets Cavium req_count after flushing multi-request poll buffer to avoid OOB writes. |
| Makefile.am | Serializes make execution when Intel QAT is enabled via .NOTPARALLEL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+326
to
+333
| /* Returns nonzero when the QAT crypto service is running. The memory layer | ||
| * uses this to decide whether a failed NUMA allocation should fall back to | ||
| * regular memory (service not started -> software mode) or remain NULL (real | ||
| * NUMA exhaustion while the device is in use). */ | ||
| int IntelQaIsStarted(void) | ||
| { | ||
| return (g_cyServiceStarted == CPA_TRUE) ? 1 : 0; | ||
| } |
Comment on lines
+413
to
+422
| /* If the QAT memory subsystem is not available (async device not | ||
| * opened, e.g. "Running without async") fall back to regular memory | ||
| * so software crypto can proceed. A NULL while the subsystem IS up | ||
| * means real NUMA exhaustion and is left NULL so the QAT operation | ||
| * fails cleanly rather than receiving non-DMA memory. */ | ||
| if (ptr == NULL && !IntelQaIsStarted()) { | ||
| isNuma = 0; | ||
| page_offset = QAE_NOT_NUMA_PAGE; | ||
| ptr = malloc(size + sizeof(qaeMemHeader)); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
cpaCyGetInstances()returns instances grouped by device, so the per-thread round-robin piled thread counts below the instance count onto device 0.IntelQaInterleaveInstances()reorders by device so consecutive threads land on different devices. Default on; opt-outQAT_NO_DEV_INTERLEAVE.-142/-140/-173) whenever the device couldn't be opened. It now falls back to regular memory so crypto runs in software, gated byIntelQaIsStarted()so a live device still gets a clean error on real NUMA exhaustion.req_countOOB write.wolfAsync_EventQueuePoll()did not resetreq_countafter the multi-request flush, indexing pastmulti_req.req[CAVIUM_MAX_POLL].HAVE_CAVIUM-gated. (CWE-787, Project Vanessa.)devinstead ofdev->heap.port/intel/README.md): sudo-free operation, serialmake check, multi-device benchmark guidance, and a QAT health-diagnostics section.Performance (3x Intel C62x, RSA-2048 sign, ops/sec)
The interleave spreads load across all 3 devices at thread counts below the instance count (18); neutral above that. AES unchanged vs master.