buffer: improve performance of multiple Buffer operations#61871
buffer: improve performance of multiple Buffer operations#61871thisalihassan wants to merge 7 commits intonodejs:mainfrom
Conversation
|
Review requested:
|
d2ba38f to
495feb5
Compare
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
These fast callbacks are non-identical to the conventional callbacks they shadow.
- The existing callbacks validate their argument and throws to JS if invalid, whereas your fast callbacks hard-crash the process. It might be better to validate in the JS layer, then use the same unwrapping logic on both sides.
- Your fast callback cannot have a different return convention to the conventional callback. You will need to remove the return value from the conventional callback.
There was a problem hiding this comment.
Was removing the validation instead of moving it to js intentional?
Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?
There was a problem hiding this comment.
Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?
This is outdated, the fast API doesn't use fallback any more (since Node.js v23.x).
However, any validation in a JS wrapper should be shadowed by a CHECK or DCHECK in the C++ binding.
There was a problem hiding this comment.
SPREAD_BUFFER_ARG already contains CHECK((val)->IsArrayBufferView()) on all Slow & Fast swap methods
lib/buffer.js
Outdated
| byteOffset, | ||
| encodingsMap.ascii, | ||
| dir), | ||
| indexOfString(buf, val, byteOffset, encodingsMap.latin1, dir), |
There was a problem hiding this comment.
Same behaviour AFAICT, but encodingsMap.ascii seems more appropriate.
There was a problem hiding this comment.
Yes same behaviour IndexOfString only has branches for UCS2, UTF8, and Latin1, Adding an ASCII branch to IndexOfString would just be a duplicate of the Latin1 branch since ASCII is a subset of Latin1.
There was a problem hiding this comment.
That's my point, these are still discrete encodings even though the behaviour here is the same, so this should pass encodingsMap.ascii to the binding and IndexOfString should add a condition to send this down the same path.
| void FastSwap16(Local<Value> receiver, | ||
| Local<Value> buffer_obj, | ||
| // NOLINTNEXTLINE(runtime/references) | ||
| FastApiCallbackOptions& options) { | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); | ||
| CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length())); | ||
| } |
There was a problem hiding this comment.
Fast callbacks should include debug tracking and call tests.
There was a problem hiding this comment.
Thanks for sharing these I will update the code
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61871 +/- ##
========================================
Coverage 89.65% 89.65%
========================================
Files 676 676
Lines 206231 206365 +134
Branches 39505 39534 +29
========================================
+ Hits 184898 185019 +121
- Misses 13463 13475 +12
- Partials 7870 7871 +1
🚀 New features to boost your workflow:
|
1395d2f to
01ba74f
Compare
src/node_buffer.cc
Outdated
| FastApiCallbackOptions& options) { | ||
| TRACK_V8_FAST_API_CALL("buffer.swap16"); | ||
| HandleScope scope(options.isolate); | ||
| ArrayBufferViewContents<char> buffer(buffer_obj); |
There was a problem hiding this comment.
ArrayBufferViewContents is wrong here, as buffer.data() may be a stack-allocated copy of the byte data rather than the data itself. SPREAD_BUFFER_ARG is the correct macro to use here, as per the conventional callback.
There was a problem hiding this comment.
@Renegade334 replaced ArrayBufferViewContents with SPREAD_BUFFER_ARG in all three fast swap callbacks
There was a problem hiding this comment.
For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)
See also #60249 (comment)
| } else if (value.length === 1) { | ||
| // Fast path: If `value` fits into a single byte, use that numeric value. | ||
| if (normalizedEncoding === 'utf8') { | ||
| if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') { |
There was a problem hiding this comment.
Currently, ascii behaves exactly like latin1
Unsure if by design or accidentally
There was a problem hiding this comment.
yes, this is safe by design I am just extending the existing single byte numeric optimization to cover ASCII, since the guard already constrains it to the valid ASCII range.
|
Note on toBase64 / toBase64url: I also tried replacing the C++ base64Slice/base64urlSlice bindings with V8's Uint8Array.prototype.toBase64() (similar to the toHex change) but it caused a 35-54% regression across all buffer sizes so I reverted base64/base64url and kept only the toHex optimization which showed a clear +26-37% win. |
|
@thisalihassan toHex doesn't show a win anymore with Instead, it's ~3x slower. See nodejs/nbytes#12 |
|
Hi @ChALkeR thanks for flagging I was not aware. I benchmarked the nibble approach locally and it's indeed a much bigger win (~3x vs my ~30% with toHex). Reverted the toHex path entirely the other changes in this PR are unaffected. Should I include the nbytes nibble HexEncode optimization in this PR or keep them as separate PRs?
|
Hi @ChALkeR I see that your changes landed, congratualtions on that huge win. Is there benchmark-ci that we can run on this PR? I can rebase it |
|
I re-ran the benchmark against the latest main (which contains the nibble improvement) and found out there are few regressions caused by my code:
I tried fixing this regression but unavoidable Attaching Details below: buffer-benchmark-all-rebased.csv -- raw benchmark CSV |
|
compare-R-output.txt-- full output from Rscript benchmark/compare.R (shared as a file to avoid cluttering the PR comment). |
- copyBytesFrom: calculate byte offsets directly instead of
slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions
- Guard ensureUint8ArrayToHex against --no-js-base-64 flag by falling back to C++ hexSlice when toHex is unavailable - Remove THROW_AND_RETURN_UNLESS_BUFFER and return value from slow Swap16/32/64 to match fast path conventions (JS validates) - Add TRACK_V8_FAST_API_CALL to FastSwap16/32/64 - Add test/parallel/test-buffer-swap-fast.js for fast API verification
Remove V8 Uint8Array.prototype.toHex() path for Buffer.toString('hex')
in favour of the upcoming nbytes HexEncode improvement (nodejs/nbytes#12)
which is ~3x faster through the existing C++ hexSlice path.
Refs: nodejs/nbytes#12
59f5d09 to
48abfc5
Compare
|
PS: I resolved some conflicts |
|
Hi @ChALkeR @anonrig @Renegade334 is this PR ready to land? can you also please check the latest benchmarks i have posted, I have compiled PDF for this benchmark in one my comments above for more detail
|
Qard
left a comment
There was a problem hiding this comment.
Generally LGTM, but one small nit.
|
Results from the CI, however it tells a slightly different story than my local benchmark at n=30 |
| return this; | ||
| } | ||
| return _swap16(this); | ||
| _swap16(this); |
There was a problem hiding this comment.
what would happen if it's called on not a TypeArrayView now?
There was a problem hiding this comment.
i.e. Buffer.prototype.swap16.apply(new Array(128))
There was a problem hiding this comment.
Good catch, I will add a JS side guard
There was a problem hiding this comment.
It changes the method to not being callable on a typedarray that is not an uint8array, but the previous behavior was inconsistent in that case anyway as the js part swapped elements and the src part swapped bytes
So I think explicitly blocking non-uint8arr should be fine and not a semver-major?
(perhaps cc @nodejs/lts just in case)
There was a problem hiding this comment.
Currently node -e 'Buffer.prototype.swap16.apply(new Float32Array(128))' does not throw. If we add a check for Uint8Array, that's a potential breaking change
There was a problem hiding this comment.
@aduh95 actually the point is it should always be thrown error on TypedArray because the current behavior silently corrupts data on the C++ path and on the JS loop path it swaps float elements instead of bytes. So a clean error is strictly better than silent data corruption. this is what @ChALkeR pointed
const f = new Float32Array(128);
f[0] = 1.5; // bytes: 00 00 C0 3F
Buffer.prototype.swap16.call(f);
console.log(f[0]); // -2.984375 (bytes: 00 00 3F C0)
Failed to start CI⚠ Commits were pushed since the last approving review: ⚠ - buffer: improve performance of multiple Buffer operations ⚠ - buffer: address PR feedback ⚠ - buffer: revert toHex in favour of nbytes HexEncode update ⚠ - resolve feedback ⚠ - resolve feedback on fast callbacks ⚠ - added comments and updated tests ⚠ - resolve feedback ✘ Refusing to run CI on potentially unsafe PRhttps://github.com/nodejs/node/actions/runs/22977319444 |
| const byteLength = TypedArrayPrototypeGetByteLength(view); | ||
|
|
||
| if (offset !== undefined || length !== undefined) { | ||
| if (offset !== undefined) { | ||
| validateInteger(offset, 'offset', 0); | ||
| if (offset >= viewLength) return new FastBuffer(); |
There was a problem hiding this comment.
In case we return here, we have called TypedArrayPrototypeGetByteLength for nothing. Can we move it further down?
There was a problem hiding this comment.
good call, let me flatten the structure currently it's quite nested to accomedate this
|
|
||
| function fromArrayLike(obj) { | ||
| if (obj.length <= 0) | ||
| const len = obj.length; |
There was a problem hiding this comment.
nit
| const len = obj.length; | |
| const { length } = obj; |
| // do the swap in javascript. For larger buffers, | ||
| // dropping down to the native code is faster. | ||
| if (!isUint8Array(this)) | ||
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); |
There was a problem hiding this comment.
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); | |
| throw new ERR_INVALID_THIS('Uint8Array'); |
There was a problem hiding this comment.
@aduh95 I believe ERR_INVALID_ARG_TYPE gives better error message, but if you think ERR_INVALID_THIS follows the spec better happy to align with the change
=== swap16 (ERR_INVALID_ARG_TYPE) ===
code: ERR_INVALID_ARG_TYPE
message: The "this" argument must be an instance of Buffer or Uint8Array. Received an instance of Float32Array
=== swap32 (ERR_INVALID_THIS) ===
code: ERR_INVALID_THIS
message: Value of "this" must be of type Uint8Array
There was a problem hiding this comment.
Overriding the this value is not really supported on Node.js APIs, so we don't really care about the error message. ERR_INVALID_ARG_TYPE is not suited, as this is not an argument. My preference would be to go with #61871 (comment), if not to use ERR_INVALID_THIS
There was a problem hiding this comment.
Oh then I think TypedArrayPrototypeGetLength might be a bit confusing as it doesn't block Float32Array
Buffer.prototype.swap32.call(new Float32Array(4));
it's a TypedArray check, not a Uint8Array check
what do you think @aduh95? check #61861_comment_by_ChaLker
| // do the swap in javascript. For larger buffers, | ||
| // dropping down to the native code is faster. | ||
| if (!isUint8Array(this)) | ||
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); |
There was a problem hiding this comment.
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); | |
| throw new ERR_INVALID_THIS('Uint8Array'); |
There was a problem hiding this comment.
same as above
| if (!isUint8Array(this)) | ||
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); | ||
| const len = this.length; |
There was a problem hiding this comment.
Alternatively, we can use the primordials to get the brand check for free
| if (!isUint8Array(this)) | |
| throw new ERR_INVALID_ARG_TYPE('this', ['Buffer', 'Uint8Array'], this); | |
| const len = this.length; | |
| const len = TypedArrayPrototypeGetLength(this); |
Or do we want TypedArrayPrototypeGetByteLength here?
There was a problem hiding this comment.
Alternatively, we can use the primordials to get the brand check for free
This would result in an unhelpful error.
TypeError: Method get TypedArray.prototype.length called on incompatible receiver
There was a problem hiding this comment.
How is that unhelpful? The full error message would be TypeError: Method get TypedArray.prototype.length called on incompatible receiver [object Array] btw
There was a problem hiding this comment.
Yup here's all three error messages:
=== swap16 (ERR_INVALID_ARG_TYPE) ===
code: ERR_INVALID_ARG_TYPE
message: The "this" argument must be an instance of Buffer or Uint8Array. Received an instance of Float32Array
=== swap32 (ERR_INVALID_THIS) ===
code: ERR_INVALID_THIS
message: Value of "this" must be of type Uint8Array
=== swap64 ===
code: undefined
message: Method get TypedArray.prototype.length called on incompatible receiver [object Array]




Multiple performance improvements to Buffer operations, verified with benchmarks (15-30 runs, comparing old vs new binaries built from same tree).
Buffer.copyBytesFrom()(+100-210%)Avoid intermediate
TypedArrayPrototypeSliceallocation by calculating byte offsets directly into the source TypedArray's underlying ArrayBuffer.Buffer.prototype.fill("t", "ascii")(+26-37%)ASCII
indexOf(+14-46%)Call
indexOfStringdirectly for ASCII encoding instead of first converting the search value to a Buffer viafromStringFastand then callingindexOfBuffer. ASCII and Latin-1 share the same byte values for characters 0-127.swap16/32/64(+3-38%)Add V8 Fast API C++ functions (
FastSwap16/32/64) alongside the existing slow path. Largest gains at len=256 (+35%).Benchmark results
Key results (15-30 runs,
***= p < 0.001):Attaching Details below:
detail.pdf -- visual breakdown
buffer-benchmark-all-rebased.csv -- raw benchmark CSV
compare-R-output.txt-- full output