Skip to content

Add tbf (token bucket filter) qdisc to support egress traffic shaping / rate limiting#13104

Merged
copybara-service[bot] merged 1 commit into
google:masterfrom
benldrmn:feat/network-traffic-shaping
May 19, 2026
Merged

Add tbf (token bucket filter) qdisc to support egress traffic shaping / rate limiting#13104
copybara-service[bot] merged 1 commit into
google:masterfrom
benldrmn:feat/network-traffic-shaping

Conversation

@benldrmn
Copy link
Copy Markdown
Contributor

@benldrmn benldrmn commented May 6, 2026

resolves the egress part of #11109.

AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after linux/net/sched/sch_tbf.c TBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary.

Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290

@benldrmn benldrmn force-pushed the feat/network-traffic-shaping branch from 996d5fb to b1a787e Compare May 6, 2026 18:17
@parth-opensrc parth-opensrc self-assigned this May 6, 2026
@EtiennePerot
Copy link
Copy Markdown
Collaborator

Can you show benchmark results?

@benldrmn
Copy link
Copy Markdown
Contributor Author

benldrmn commented May 8, 2026

@EtiennePerot sure.
Ran them on my laptop (so YMMV) with Intel(R) Core(TM) Ultra 7 155H CPU, 20 iterations, controlled the power governor and thermals (waited between runs until the cpu temperature went back to baseline) to avoid throttling affecting the measurements too much.

Used iperf client inside the sandbox and the server outside (in the same kind cluster), host gso enabled, qdisc-tbf-rate = 100000000000 (100 Gbps), qdisc-tbf-burst = 134217728 (128 MiB), 20 iterations for each setup:

qdisc streams gbps sandbox cpu%
fifo 1 44.06 ± 0.31 Gbps 394.7 ± 1.1 %
tbf 1 43.98 ± 0.44 Gbps 393.7 ± 1.7 %
none 1 23.67 ± 0.33 Gbps 223.0 ± 0.6 %
fifo 4 68.38 ± 3.42 Gbps 758.1 ± 58.3 %
tbf 4 43.78 ± 0.76 Gbps 414.7 ± 4.0 %
none 4 46.33 ± 0.08 Gbps 396.8 ± 1.7 %

similar, but 12 iterations per setup (building up to 32 concurrent streams):

streams fifo (Gbps) none (Gbps) tbf (Gbps)
1 44.7 ± 0.8 23.8 ± 0.1 44.5 ± 0.8
2 73.5 ± 0.4 39.6 ± 0.2 46.9 ± 0.2
4 74.3 ± 3.0 46.4 ± 0.1 43.7 ± 0.6
8 53.3 ± 1.5 57.8 ± 4.2 39.7 ± 0.6
16 52.6 ± 3.0 51.9 ± 1.6 31.1 ± 0.3
32 54.5 ± 2.0 56.2 ± 1.8 32.2 ± 0.2

bounding the client pace to more realistic numbers - iperf client is sending in target pace (5 iterations for each setup):

target qdisc sandbox cpu%
10 Mbps tbf 3.47 ± 0.13
10 Mbps fifo 3.61 ± 0.12
10 Mbps none 3.40 ± 0.09
100 Mbps tbf 7.23 ± 0.10
100 Mbps fifo 7.23 ± 0.06
100 Mbps none 7.48 ± 0.07
250 Mbps tbf 13.17 ± 0.48
250 Mbps fifo 13.08 ± 1.09
250 Mbps none 13.18 ± 0.72
500 Mbps tbf 19.01 ± 0.42
500 Mbps fifo 19.36 ± 0.70
500 Mbps none 19.04 ± 0.87

// psched_ratecfg_precompute__ in net/sched/sch_generic.c.
func len2TimeNS(rate uint64, len uint32) uint64 {
const nsecPerSec = 1000000000
return uint64(len) * nsecPerSec / rate
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get rid of this division by precalculating mul and shift params in New, for the "cost" of complicating the code a bit (the precalculation and storing those less intuitive mul and shift for later use in len2TimeNS)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the pre-computation would affect the benchmarks; then sure update it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a follow up PR where I implement a latency parameter to bound the queue size based on the max acceptable latency (see https://man7.org/linux/man-pages/man8/tc-tbf.8.html) - I can add it there / another PR after benchmarking it a bit. To be honest the branchmarks comparing this implementation to fifo/none qdisc made me think that optimization isn't very important, especially assuming qdisc is used to shape traffic to sane limits (and not, say, 30Gpbs), there the tbf implementation is competitive as it is...

@@ -0,0 +1,109 @@
// Copyright 2022 The gVisor Authors.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved it from the fifo package so I can reuse the logic in tbf, I don't know why git recognizes it as "new" code. The diff is basically exporting the relevant symbols and adding a peek method

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's alright, I guess. Maybe try git mv; if you haven't.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's what I did, I am surprised as well it treats it as a new file.

// +checklocksignore: we don't have to hold locks during initialization.
func New(lower stack.LinkEndpoint, clock tcpip.Clock, rate uint64, burst, queueLen uint32) (stack.QueueingDiscipline, error) {
if rate == 0 {
return nil, fmt.Errorf("qdisc=tbf requires setting qdisc-tbf-rate")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I use some tcpip error in this file instead of returning fmt.Errorfs?

// psched_ratecfg_precompute__ in net/sched/sch_generic.c.
func len2TimeNS(rate uint64, len uint32) uint64 {
const nsecPerSec = 1000000000
return uint64(len) * nsecPerSec / rate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the pre-computation would affect the benchmarks; then sure update it.

@@ -0,0 +1,109 @@
// Copyright 2022 The gVisor Authors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's alright, I guess. Maybe try git mv; if you haven't.

@benldrmn
Copy link
Copy Markdown
Contributor Author

@parth-opensrc thank you for the review!

@ayushr2
Copy link
Copy Markdown
Collaborator

ayushr2 commented May 19, 2026

Could you rebase and repush? There seem to be conflicts.

Implements a single-rate TBF qdisc modeled on Linux's net/sched/sch_tbf.c
and exposes it via --qdisc=tbf, with required --qdisc-tbf-rate and
--qdisc-tbf-burst flags. OCI annotations can lower the configured rate
and burst ceilings but not raise them without --allow-flag-override.

The fifo qdisc's circular packet-buffer list moves into a shared
pkg/tcpip/link/qdisc package so both qdiscs share one implementation.
Loopback and ingress traffic are not shaped.
@benldrmn benldrmn force-pushed the feat/network-traffic-shaping branch from b1a787e to fcb5c35 Compare May 19, 2026 18:41
@benldrmn
Copy link
Copy Markdown
Contributor Author

Could you rebase and repush? There seem to be conflicts.

done

copybara-service Bot pushed a commit that referenced this pull request May 19, 2026
… / rate limiting

resolves the egress part of #11109.

AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after `linux/net/sched/sch_tbf.c` TBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary.

Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13104 from benldrmn:feat/network-traffic-shaping fcb5c35
PiperOrigin-RevId: 916176633
copybara-service Bot pushed a commit that referenced this pull request May 19, 2026
… / rate limiting

resolves the egress part of #11109.

AI usage disclosure: used ai to help me generate some tests and cleanups after the manual implementation I did by hand modeled after `linux/net/sched/sch_tbf.c` TBF implementation. Also used it in guided documentation writing. I understand every line of code written and wrote the core logic manually, and happy to answer questions / revisit parts of the implementation as necessary.

Also tested and benchmarked manually against a local Kubernetes kind cluster to ensure it resolves isola-run/isola#290

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13104 from benldrmn:feat/network-traffic-shaping fcb5c35
PiperOrigin-RevId: 916176633
@copybara-service copybara-service Bot merged commit 18331ea into google:master May 19, 2026
3 checks passed
// Close implements stack.QueueingDiscipline.Close.
func (d *discipline) Close() {
d.closed.Store(qDiscClosed)
d.closeWaker.Assert()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @benldrmn

Can this cause race conditions?
I mean, if one thread calls Close, while the other is at WritePacket:210 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that too during implementation, but prefered to keep the implementation similar to fifo (it has the same issue) to ease the code review. I'll submit a fix to both tbf and fifo, ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parth-opensrc here you go: #13280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: networking Issue related to networking ready to pull

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Egress Traffic Shaping

4 participants