kvstorage: Add batching to the WAGTruncator#168351
kvstorage: Add batching to the WAGTruncator#168351iskettaneh wants to merge 2 commits intocockroachdb:masterfrom
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link) |
4236da9 to
55a5cbd
Compare
55a5cbd to
5ca62af
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
@iskettaneh made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on pav-kv).
pkg/kv/kvserver/kvstorage/wag_truncator_test.go line 528 at r6 (raw file):
// thing, but it should give an idea of the improvement of different batch // sizes. func BenchmarkWAGTruncation(b *testing.B) {
@pav-kv I am not sure if the Benchmark is needed really, it just helped me make sure that the batching works and helped me pick a default batchSize.
pav-kv
left a comment
There was a problem hiding this comment.
The non-test code LGTM. I'll review tests a bit later, stepping off for today.
| if err = t.clearReplicaRaftLogAndSideloaded(ctx, | ||
| Raft{RO: t.eng.LogEngine(), WO: b}, event.Addr.RangeID, event.Addr.Index); err != nil { | ||
| return false, err | ||
| } |
There was a problem hiding this comment.
nit (idiomatic): err :=, bring err != nil to next line
if err := t.clearReplicaRaftLogAndSideloaded(
ctx, Raft{RO: t.eng.LogEngine(), WO: b}, event.Addr.RangeID, event.Addr.Index,
); err != nil {
return false, err
}| } | ||
| return index, nil | ||
|
|
||
| if err = b.Commit(false); err != nil { |
| settings.SystemOnly, | ||
| "kv.wag.truncator_batch_size", | ||
| "number of WAG nodes to delete per write batch during truncation", | ||
| 8, |
There was a problem hiding this comment.
How about 32 or 64? Looks 2x better than 8 according to benchmarks. Wonder if 64 being slower than 32 is a flake, or there is some actual slowdown.
There was a problem hiding this comment.
I reran the benchmark (for 5 times), and it gets kinda noisy after 16:
BenchmarkWAGTruncation/batchSize=1 100000 15413 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 17085 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 16959 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 15240 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 15508 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7110 ns/op 611 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 5439 ns/op 610 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 5314 ns/op 610 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7148 ns/op 590 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7059 ns/op 612 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 3677 ns/op 539 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5388 ns/op 540 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5329 ns/op 539 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 3750 ns/op 540 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5428 ns/op 541 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 3021 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 4555 ns/op 496 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 2834 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 2827 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 4440 ns/op 501 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2374 ns/op 487 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 3944 ns/op 484 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2514 ns/op 486 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2406 ns/op 485 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 4016 ns/op 484 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 3946 ns/op 479 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2328 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2247 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 3747 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2192 ns/op 480 B/op 9 allocs/op
PASS
I think I will pick 16. There is one downside that I can think of with larger batch size. If there is some transient disk error or something of that sort, the larger batch size will be a bit more likely to encounter it.
| "kv.wag.truncator_batch_size", | ||
| "number of WAG nodes to delete per write batch during truncation", | ||
| 8, | ||
| settings.IntInRange(1, 1024), |
There was a problem hiding this comment.
1024 seems a reasonable cap. Not just using IntWithMinimum to make things "safe" in some sense?
It probably wouldn't be much of a win to raise it much higher anyway?
There was a problem hiding this comment.
Yeah I don't see a reason why we might want to increase it above 1024.
| // their events have been applied to the state engine. For nodes containing | ||
| // EventDestroy or EventSubsume events, it also clears the corresponding raft | ||
| // log prefix from the engine and the sideloaded entries storage. | ||
| // log prefix from the engine and the sideloaded entries. |
There was a problem hiding this comment.
nit: revert "storage"? The old thing reads to me as "clears the ... raft log prefix ... from .. the sideloaded entries storage". The sentence seems broken without "storage".
| if count == 0 { | ||
| return false, nil | ||
| } | ||
| if err := iter.Error(); err != nil { |
There was a problem hiding this comment.
Put this check first? count=0 is possible on an error, so we should probably pritoritize returning an error in this case. Could also squash as:
if err := iter.Error(); err != nil || count == 0 {
return false, err
}| if err := iter.Error(); err != nil { | ||
| return false, err | ||
| } | ||
| if err := b.Commit(false); err != nil { |
| if err := b.Commit(false); err != nil { | ||
| return false, err | ||
| } | ||
| t.truncIndex.Store(targetIndex - 1) // targetIndex is pointing at the last index truncated + 1. |
There was a problem hiding this comment.
How about flipping the script a bit, so that we don't +- 1 so much?
truncated := t.truncIndex.Load()
for ... {
if index != truncated+1 && index > t.initIndex {
// We cannot ignore gaps for WAG indices > initIndex.
break
}
...
truncated = index
count++
...
}
...
t.truncIndex.Store(truncated)
return true, nil"-1" needs a "proof" and relies on the "count == 0" early exit above. Whereas this way, no proof needed, and even an accidental unconditional Store would be correct in the no-op case.
There was a problem hiding this comment.
Yeah that makes sense
…learRaftState Move batch creation, commit, and truncIndex advancement from truncateAppliedNodes into truncateAppliedWAGNodeAndClearRaftState, making the latter fully self-contained. This simplifies the caller loop and makes the method signature cleaner (bool instead of uint64). Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Previously, truncateAppliedWAGNodeAndClearRaftState deleted one WAG node per batch and committed immediately. This commit does the following: 1) Rename truncateAppliedWAGNodeAndClearRaftState() to truncateBatch(). 2) Introduce a cluster-setting that controls the batch size. 3) Try to fit up-to batchSize deletion in each call to truncateBatch(). Benchmark results: ``` BenchmarkWAGTruncation/batchSize=1 100000 15413 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 17085 ns/op 959 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 16959 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15240 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=1 100000 15508 ns/op 1003 B/op 20 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7110 ns/op 611 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5439 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 5314 ns/op 610 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7148 ns/op 590 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=4 100000 7059 ns/op 612 B/op 11 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3677 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5388 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5329 ns/op 539 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 3750 ns/op 540 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=8 100000 5428 ns/op 541 B/op 10 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 3021 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4555 ns/op 496 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2834 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 2827 ns/op 495 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=16 100000 4440 ns/op 501 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2374 ns/op 487 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 3944 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2514 ns/op 486 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 2406 ns/op 485 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=32 100000 4016 ns/op 484 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3946 ns/op 479 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2328 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2247 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 3747 ns/op 480 B/op 9 allocs/op BenchmarkWAGTruncation/batchSize=64 100000 2192 ns/op 480 B/op 9 allocs/op ``` Release note: None Epic: none
5ca62af to
cf21b21
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
@iskettaneh made 5 comments and resolved 3 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on pav-kv).
| settings.SystemOnly, | ||
| "kv.wag.truncator_batch_size", | ||
| "number of WAG nodes to delete per write batch during truncation", | ||
| 8, |
There was a problem hiding this comment.
I reran the benchmark (for 5 times), and it gets kinda noisy after 16:
BenchmarkWAGTruncation/batchSize=1 100000 15413 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 17085 ns/op 959 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 16959 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 15240 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=1 100000 15508 ns/op 1003 B/op 20 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7110 ns/op 611 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 5439 ns/op 610 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 5314 ns/op 610 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7148 ns/op 590 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=4 100000 7059 ns/op 612 B/op 11 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 3677 ns/op 539 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5388 ns/op 540 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5329 ns/op 539 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 3750 ns/op 540 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=8 100000 5428 ns/op 541 B/op 10 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 3021 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 4555 ns/op 496 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 2834 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 2827 ns/op 495 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=16 100000 4440 ns/op 501 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2374 ns/op 487 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 3944 ns/op 484 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2514 ns/op 486 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 2406 ns/op 485 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=32 100000 4016 ns/op 484 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 3946 ns/op 479 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2328 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2247 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 3747 ns/op 480 B/op 9 allocs/op
BenchmarkWAGTruncation/batchSize=64 100000 2192 ns/op 480 B/op 9 allocs/op
PASS
I think I will pick 16. There is one downside that I can think of with larger batch size. If there is some transient disk error or something of that sort, the larger batch size will be a bit more likely to encounter it.
| "kv.wag.truncator_batch_size", | ||
| "number of WAG nodes to delete per write batch during truncation", | ||
| 8, | ||
| settings.IntInRange(1, 1024), |
There was a problem hiding this comment.
Yeah I don't see a reason why we might want to increase it above 1024.
| if count == 0 { | ||
| return false, nil | ||
| } | ||
| if err := iter.Error(); err != nil { |
| if err := iter.Error(); err != nil { | ||
| return false, err | ||
| } | ||
| if err := b.Commit(false); err != nil { |
| if err := b.Commit(false); err != nil { | ||
| return false, err | ||
| } | ||
| t.truncIndex.Store(targetIndex - 1) // targetIndex is pointing at the last index truncated + 1. |
There was a problem hiding this comment.
Yeah that makes sense
This PR adds the ability to truncate multiple WAG nodes in a single batch. It adds the cluster setting
kv.wag.truncator_batch_sizeto control the batch size.Batch sizes benchmark:
References: #167607
Release note: None
Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com