Add tbf (token bucket filter) qdisc to support egress traffic shaping / rate limiting#13104
Conversation
996d5fb to
b1a787e
Compare
|
Can you show benchmark results? |
|
@EtiennePerot sure. Used iperf client inside the sandbox and the server outside (in the same kind cluster), host gso enabled, qdisc-tbf-rate = 100000000000 (100 Gbps), qdisc-tbf-burst = 134217728 (128 MiB), 20 iterations for each setup:
similar, but 12 iterations per setup (building up to 32 concurrent streams):
bounding the client pace to more realistic numbers - iperf client is sending in
|
| // psched_ratecfg_precompute__ in net/sched/sch_generic.c. | ||
| func len2TimeNS(rate uint64, len uint32) uint64 { | ||
| const nsecPerSec = 1000000000 | ||
| return uint64(len) * nsecPerSec / rate |
There was a problem hiding this comment.
I can get rid of this division by precalculating mul and shift params in New, for the "cost" of complicating the code a bit (the precalculation and storing those less intuitive mul and shift for later use in len2TimeNS)
There was a problem hiding this comment.
If you think the pre-computation would affect the benchmarks; then sure update it.
There was a problem hiding this comment.
I have a follow up PR where I implement a latency parameter to bound the queue size based on the max acceptable latency (see https://man7.org/linux/man-pages/man8/tc-tbf.8.html) - I can add it there / another PR after benchmarking it a bit. To be honest the branchmarks comparing this implementation to fifo/none qdisc made me think that optimization isn't very important, especially assuming qdisc is used to shape traffic to sane limits (and not, say, 30Gpbs), there the tbf implementation is competitive as it is...
| @@ -0,0 +1,109 @@ | |||
| // Copyright 2022 The gVisor Authors. | |||
There was a problem hiding this comment.
I just moved it from the fifo package so I can reuse the logic in tbf, I don't know why git recognizes it as "new" code. The diff is basically exporting the relevant symbols and adding a peek method
There was a problem hiding this comment.
That's alright, I guess. Maybe try git mv; if you haven't.
There was a problem hiding this comment.
yeah that's what I did, I am surprised as well it treats it as a new file.
| // +checklocksignore: we don't have to hold locks during initialization. | ||
| func New(lower stack.LinkEndpoint, clock tcpip.Clock, rate uint64, burst, queueLen uint32) (stack.QueueingDiscipline, error) { | ||
| if rate == 0 { | ||
| return nil, fmt.Errorf("qdisc=tbf requires setting qdisc-tbf-rate") |
There was a problem hiding this comment.
should I use some tcpip error in this file instead of returning fmt.Errorfs?
| // psched_ratecfg_precompute__ in net/sched/sch_generic.c. | ||
| func len2TimeNS(rate uint64, len uint32) uint64 { | ||
| const nsecPerSec = 1000000000 | ||
| return uint64(len) * nsecPerSec / rate |
There was a problem hiding this comment.
If you think the pre-computation would affect the benchmarks; then sure update it.
| @@ -0,0 +1,109 @@ | |||
| // Copyright 2022 The gVisor Authors. | |||
There was a problem hiding this comment.
That's alright, I guess. Maybe try git mv; if you haven't.
|
@parth-opensrc thank you for the review! |
|
Could you rebase and repush? There seem to be conflicts. |
Implements a single-rate TBF qdisc modeled on Linux's net/sched/sch_tbf.c and exposes it via --qdisc=tbf, with required --qdisc-tbf-rate and --qdisc-tbf-burst flags. OCI annotations can lower the configured rate and burst ceilings but not raise them without --allow-flag-override. The fifo qdisc's circular packet-buffer list moves into a shared pkg/tcpip/link/qdisc package so both qdiscs share one implementation. Loopback and ingress traffic are not shaped.
b1a787e to
fcb5c35
Compare
done |
… / rate limiting resolves the egress part of #11109. AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after `linux/net/sched/sch_tbf.c` TBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary. Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290 FUTURE_COPYBARA_INTEGRATE_REVIEW=#13104 from benldrmn:feat/network-traffic-shaping fcb5c35 PiperOrigin-RevId: 916176633
… / rate limiting resolves the egress part of #11109. AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after `linux/net/sched/sch_tbf.c` TBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary. Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290 FUTURE_COPYBARA_INTEGRATE_REVIEW=#13104 from benldrmn:feat/network-traffic-shaping fcb5c35 PiperOrigin-RevId: 916176633
| // Close implements stack.QueueingDiscipline.Close. | ||
| func (d *discipline) Close() { | ||
| d.closed.Store(qDiscClosed) | ||
| d.closeWaker.Assert() |
There was a problem hiding this comment.
Hi @benldrmn
Can this cause race conditions?
I mean, if one thread calls Close, while the other is at WritePacket:210 ?
There was a problem hiding this comment.
Yeah, I noticed that too during implementation, but prefered to keep the implementation similar to fifo (it has the same issue) to ease the code review. I'll submit a fix to both tbf and fifo, ok?
resolves the egress part of #11109.
AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after
linux/net/sched/sch_tbf.cTBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary.Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290