Skip to content

Latest commit

 

History

History
68 lines (44 loc) · 5.91 KB

File metadata and controls

68 lines (44 loc) · 5.91 KB

xtcp2 documentation

This is the documentation hub for xtcp2, a high-performance Linux daemon that streams kernel TCP socket state across every network namespace on a host to a configurable destination. If you just want to build and run it, start with the top-level README. This hub explains how it works and links to a dedicated document for each major feature.

Table of contents

Background

The Linux kernel exposes detailed TCP socket diagnostics over a netlink socket family called inet_diag (sock_diag) — the same mechanism ss --info uses. Compared to parsing /proc/net/tcp, netlink is far cheaper and carries structured per-socket attributes: the full tcp_info struct, the congestion-control algorithm and its private state (BBR, DCTCP, Vegas), socket memory accounting, cgroup and class IDs, and more.

xtcp2 issues inet_diag dump requests and turns each reply into a flat protobuf record. Crucially, a netlink socket only sees the network namespace it was opened in, so to observe containers xtcp2 enters each namespace with setns(CLONE_NEWNET) and opens a socket there. The kernel also offers NDIAG_FLAG_LISTEN_ALL_NSID for cross-namespace listening sockets — see the netlink_diag.h definition — but the per-namespace-socket approach is what gives xtcp2 a consistent view of every TCP socket in every namespace.

This project is a complete rewrite of the original xtcp, keeping the concept but rebuilding the internals for throughput and namespace coverage.

Design philosophy

  • Build-tagged destinations. Heavy client libraries (Kafka, NATS, NSQ, Valkey) are gated behind //go:build dest_<scheme> tags so you can compile a slim binary with only the destinations you need. The stdlib destinations (null, udp, unix, unixgram) are always compiled in. See build flavors.
  • Pooled allocations. Packet buffers, netlink headers, and protobuf Envelope / XtcpFlatRecord messages are recycled through type-safe sync.Pool wrappers (pkg/xsync) to keep GC pressure low under high socket counts.
  • Parallelism per namespace. Each namespace gets multiple netlink readers (-netlinkers) so a host with many flows isn't bottlenecked on a single goroutine.
  • Optional io_uring. On Linux 6.1+ an opt-in io_uring path batches netlink recvmsg and raw-socket writes to cut syscall overhead.
  • Fail fast and loud. The daemon verifies its Linux capabilities at startup and refuses to run (with a precise message) when a hard requirement is missing.

Core features

Each feature has a dedicated document with its own table of contents and component breakdown.

Reads TCP socket state from the kernel via the inet_diag netlink interface. A registry of 13 attribute deserializers (info, cong, meminfo, skmem, bbr, dctcp, vegas, tos, tc, shut, classid, cgroup, sockopt) decodes each socket's attributes into a flat record; you choose which to decode with -deserializers.

Discovers network namespaces under /run/netns/ and /run/docker/netns/, watches them with inotify, and runs one netlink reader per namespace via setns. Namespaces that appear and disappear (container/pod churn) are reconciled continuously, with careful OS thread management to avoid leaks.

A poll loop dumps every namespace on a fixed interval, deserializes the replies into an in-memory protobuf Envelope, and flushes that batch to the destination when it crosses a row-count or byte-size threshold.

Four marshallers (protobufList, protoJson, protoText, msgpack) and nine pluggable destinations (Kafka with schema-registry support, NATS, NSQ, Valkey, UDP, Unix stream/datagram, S3/Parquet, and null). Includes the protobufList batch format used for ClickHouse ingestion.

Two gRPC services on :8889: a ConfigService to read and change daemon configuration at runtime, and an XTCPFlatRecordService to stream or poll live records. The xtcp2client binary and vendored grpcurl are the clients.

Prometheus metrics, Go pprof endpoints, optional Pyroscope continuous profiling, and the startup capability check that explains exactly which Linux capabilities are required.

The optional io_uring reader/writer path, the pkg/xsync typed sync.Pool / sync.Map wrappers, netlinker parallelism, and runtime knobs (GOMAXPROCS, OS thread cap).

The captured netlink .pcap fixture corpus spanning many kernel versions, the reflection-free typed deserializers it validates, the ~800-test suite at over 92% coverage, the benchmarks, and the custom audit tools.

For developers

See CONTRIBUTING.md for the development environment, the full set of Nix build/test targets, the automated test suite (unit, race, per-flavor, and microVM integration tests), linting tiers, and protobuf regeneration. Related references:

Operations

See operations for running the end-to-end ClickHouse / Redpanda pipeline with docker-compose and for querying the resulting data.