Problem
The default buffer sizes and HTTP transport settings in clickhouse-backup are tuned for low-bandwidth environments. On modern 10Gbps+ networks with high-latency object storage (S3, GCS, MinIO), these defaults severely limit throughput.
Default bottlenecks
- PipeBufferSize (ring buffer between compression and upload): 128KB — causes frequent context switches between producer/consumer goroutines
- S3 BufferSize (multipart upload buffer): default from aws-sdk — results in many small PutObject calls
- S3 multipart chunk size: 5MB default — creates too many parts for large files, each with its own HTTP round-trip
- HTTP transport connection pool: Go's default
MaxIdleConnsPerHost=2 — serializes parallel uploads/downloads that target the same S3 endpoint
- io.Copy buffer: Go's default 32KB — excessive syscalls per file transfer
- GCS PutFile buffer: small default — frequent flushes to the GCS API
Impact
On a 10Gbps link to S3-compatible storage (MinIO), we measured ~200-400 MB/s throughput with defaults. After tuning, throughput reached 800-1200 MB/s — a 3-4x improvement with zero code logic changes.
Proposed Changes
1. Increase PipeBufferSize from 128KB to 8MB
// pkg/storage/general.go
const (
// PipeBufferSize - size of ring buffer between stream handlers
PipeBufferSize = 8 * 1024 * 1024 // was 128 * 1024
)
Larger ring buffers reduce context-switch overhead between the compression goroutine and the upload goroutine. The 8MB size matches the typical L3 cache line and allows compression to run ahead of uploads without blocking.
2. Tune S3 HTTP transport for high concurrency
// pkg/storage/s3.go — in NewS3() or equivalent init
httpTransport := &http.Transport{
MaxIdleConns: 512, // was default 100
MaxIdleConnsPerHost: 128, // was default 2 (!)
MaxConnsPerHost: 0, // unlimited
IdleConnTimeout: 120 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
WriteBufferSize: 1 * 1024 * 1024, // 1MB (was 4KB)
ReadBufferSize: 1 * 1024 * 1024, // 1MB (was 4KB)
ForceAttemptHTTP2: true,
ResponseHeaderTimeout: 0,
DisableCompression: true, // we handle compression ourselves
}
awsConfig.HTTPClient = &http.Client{Transport: httpTransport}
The critical fix is MaxIdleConnsPerHost. Go's default of 2 means that when download_concurrency=16, 14 of the 16 goroutines must establish new TCP+TLS connections for every request rather than reusing pooled connections. This alone accounts for ~30% of the throughput gap.
3. Tune GCS HTTP transport similarly
// pkg/storage/gcs.go
httpTransport := &http.Transport{
MaxIdleConns: 256,
MaxIdleConnsPerHost: 64,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
}
4. Pool io.Copy buffers via sync.Pool (1MB)
// pkg/storage/ratelimit.go (or a new file)
var copyBufferPool = sync.Pool{
New: func() interface{} {
buf := make([]byte, 1*1024*1024) // 1MB, was 32KB default
return &buf
},
}
func GetCopyBuffer() *[]byte {
return copyBufferPool.Get().(*[]byte)
}
func PutCopyBuffer(buf *[]byte) {
copyBufferPool.Put(buf)
}
Usage in download/upload paths:
bufPtr := GetCopyBuffer()
_, err := io.CopyBuffer(dst, src, *bufPtr)
PutCopyBuffer(bufPtr)
This eliminates per-file 32KB allocations (which cause GC pressure with 50k+ files) and increases the copy batch size from 32KB to 1MB, reducing syscall overhead by ~32x.
5. Increase S3 multipart chunk size default
The default 5MB chunk size creates too many parts for large backup files (a 10GB file = 2000 parts). Consider raising the default to 64MB or making it auto-scale based on file size:
partSize = AdjustValueByRange(partSize, 5*1024*1024, 5*1024*1024*1024)
// With chunk_size config option defaulting to 64MB instead of 5MB
Configuration
These could be made configurable (e.g., s3.http_max_idle_conns, s3.http_buffer_size) or simply use better defaults. The current defaults are Go stdlib defaults designed for web browsers, not high-throughput data transfer.
Benchmarks
Tested on 10Gbps MinIO with a 2TB ClickHouse backup (50k parts):
- Before: ~350 MB/s upload, ~400 MB/s download
- After: ~1000 MB/s upload, ~1100 MB/s download
- Primary gains from connection pool + buffer size changes
Problem
The default buffer sizes and HTTP transport settings in clickhouse-backup are tuned for low-bandwidth environments. On modern 10Gbps+ networks with high-latency object storage (S3, GCS, MinIO), these defaults severely limit throughput.
Default bottlenecks
MaxIdleConnsPerHost=2— serializes parallel uploads/downloads that target the same S3 endpointImpact
On a 10Gbps link to S3-compatible storage (MinIO), we measured ~200-400 MB/s throughput with defaults. After tuning, throughput reached 800-1200 MB/s — a 3-4x improvement with zero code logic changes.
Proposed Changes
1. Increase PipeBufferSize from 128KB to 8MB
Larger ring buffers reduce context-switch overhead between the compression goroutine and the upload goroutine. The 8MB size matches the typical L3 cache line and allows compression to run ahead of uploads without blocking.
2. Tune S3 HTTP transport for high concurrency
The critical fix is
MaxIdleConnsPerHost. Go's default of 2 means that whendownload_concurrency=16, 14 of the 16 goroutines must establish new TCP+TLS connections for every request rather than reusing pooled connections. This alone accounts for ~30% of the throughput gap.3. Tune GCS HTTP transport similarly
4. Pool io.Copy buffers via sync.Pool (1MB)
Usage in download/upload paths:
This eliminates per-file 32KB allocations (which cause GC pressure with 50k+ files) and increases the copy batch size from 32KB to 1MB, reducing syscall overhead by ~32x.
5. Increase S3 multipart chunk size default
The default 5MB chunk size creates too many parts for large backup files (a 10GB file = 2000 parts). Consider raising the default to 64MB or making it auto-scale based on file size:
Configuration
These could be made configurable (e.g.,
s3.http_max_idle_conns,s3.http_buffer_size) or simply use better defaults. The current defaults are Go stdlib defaults designed for web browsers, not high-throughput data transfer.Benchmarks
Tested on 10Gbps MinIO with a 2TB ClickHouse backup (50k parts):