Profile your Kubernetes applications with zero overhead and zero modifications π
kubectl-prof is a powerful kubectl plugin that enables low-overhead profiling of applications running in Kubernetes environments. Generate FlameGraphs, JFR files, thread dumps, heap dumps, and many other diagnostic outputs without modifying your pods.
β¨ Key Features:
- π― Zero modification - Profile running pods without any changes to your deployment
- π Multi-language support - Java, Go, Python, Ruby, Node.js, Rust, Clang/Clang++, PHP, .NET
- π Multiple output formats - FlameGraphs, JFR, SpeedScope, thread dumps, heap dumps, GC dumps, memory dumps, and more
- β‘ Low overhead - Minimal impact on running applications
- π Continuous profiling - Support for both discrete and continuous profiling modes
This is an open source fork of kubectl-flame with enhanced features and bug fixes.
| Language | Status | Tools Available |
|---|---|---|
| β Java (JVM) | β Fully Supported | async-profiler, jcmd |
| πΉ Go | β Fully Supported | eBPF profiling, pprof |
| π Python | β Fully Supported | py-spy, memray |
| π Ruby | β Fully Supported | rbspy |
| π Node.js | β Fully Supported | eBPF profiling, perf |
| π¦ Rust | β Fully Supported | cargo-flamegraph |
| βοΈ Clang/Clang++ | β Fully Supported | eBPF profiling, perf |
| π PHP | β Fully Supported | phpspy |
| π£ .NET (Core/5+) | β Fully Supported | dotnet-trace, dotnet-gcdump, dotnet-counters, dotnet-dump |
- Containerd -
--runtime=containerd(default) - CRI-O -
--runtime=crio
For eBPF profiling (Go, Node.js, Clang/Clang++), two tools are available:
- Requirements: Kernel headers or kheaders module (
/lib/modules) - Usage: Automatically used by default (no
--toolflag needed) - Compatibility: Works on most systems with kernel headers installed
- Requirements:
- Linux kernel 5.2+ with BTF enabled (check
/sys/kernel/btf/vmlinux) - BPF CPU v2 support (kernel 5.2+)
- Linux kernel 5.2+ with BTF enabled (check
- Usage: Add
--tool btfflag to your command - Benefits:
- β No kernel headers required - works on DigitalOcean and other cloud providers without kheaders
- β Uses CO-RE (Compile Once - Run Everywhere) technology
- β Portable across different kernel versions without recompilation
- β Smaller Docker image size
- Note: Most modern distributions (Ubuntu 20.04+, RHEL 8+, etc.) include BTF by default and meet the kernel requirements
Example using BTF:
kubectl prof my-pod -t 1m -l go --tool btfProfile a Java application for 1 minute and save the FlameGraph:
kubectl prof my-pod -t 1m -l javaProfile a Python application and save to a specific location:
kubectl prof my-pod -t 1m -l python --local-path=/tmpProfile a Rust application with cargo-flamegraph:
kubectl prof my-pod -t 1m -l rustProfile a PHP application and generate a FlameGraph:
kubectl prof my-pod -t 1m -l phpProfile multiple pods using a label selector:
kubectl prof --selector app=myapp -t 5m -l java -o jfrProfile a Java application for 5 minutes and generate a FlameGraph:
kubectl prof my-pod -t 5m -l java -o flamegraph --local-path=/tmpπ‘ Tip: If
--local-pathis omitted, the FlameGraph will be saved to the current directory.
For Java applications running in Alpine-based containers, use the --alpine flag:
kubectl prof mypod -t 1m -l java -o flamegraph --alpine
β οΈ Note: The--alpineflag is only required for Java applications.
Using jcmd (default for JFR):
kubectl prof mypod -t 5m -l java -o jfrUsing async-profiler:
kubectl prof mypod -t 5m -l java -o jfr --tool async-profilerGenerate a thread dump using jcmd:
kubectl prof mypod -l java -o threaddumpGenerate a heap dump in hprof format:
kubectl prof mypod -l java -o heapdump --tool jcmdHeap dumps can be large files. Use --output-split-size to split the result into smaller chunks for easier transfer (default: 50M):
# Split into 100 MB chunks
kubectl prof mypod -l java -o heapdump --tool jcmd --output-split-size=100M
# Split into 1 GB chunks
kubectl prof mypod -l java -o heapdump --tool jcmd --output-split-size=1Gπ‘ Tip: The value follows the format accepted by the
splitUnix command (e.g.50M,200M,1G).
Generate a heap histogram:
kubectl prof mypod -l java -o heaphistogram --tool jcmdWhen using async-profiler, you can specify different event types:
# CPU profiling (default: ctimer)
kubectl prof mypod -t 5m -l java -e cpu
# Memory allocation profiling
kubectl prof mypod -t 5m -l java -e alloc
# Lock contention profiling
kubectl prof mypod -t 5m -l java -e lockSupported events: cpu, alloc, lock, cache-misses, wall, itimer, ctimer
You can pass additional command-line arguments to async-profiler using the --async-profiler-args flag. This is useful for enabling specific profiling modes or customizing profiler behavior:
# Wall-clock profiling in per-thread mode (most useful for wall-clock profiling)
kubectl prof mypod -t 5m -l java -e wall --async-profiler-args -t
# Multiple additional arguments
kubectl prof mypod -t 5m -l java -e alloc --async-profiler-args -t --async-profiler-args --alloc=2m
# Combine with other options
kubectl prof mypod -t 5m -l java -e wall -o flamegraph --async-profiler-args -tCommon use cases:
-t- Per-thread mode (recommended for wall-clock profiling)--alloc=SIZE- Set allocation profiling interval--lock=DURATION- Set lock profiling threshold--cstack=MODE- Control how native frames are captured
π‘ Tip: Refer to the async-profiler documentation for a complete list of available arguments and their descriptions.
kubectl prof mypod -t 1m -l python -o flamegraph --local-path=/tmpkubectl prof mypod -l python -o threaddump --local-path=/tmpGenerate a SpeedScope compatible file:
kubectl prof mypod -t 1m -l python -o speedscope --local-path=/tmpMemray is a memory profiler for Python that tracks every allocation and deallocation made by your code. Unlike py-spy (which profiles CPU usage), memray reveals where your application allocates memory, helping you find memory leaks, reduce peak memory usage, and understand allocation patterns.
Memray attaches to running Python processes via GDB injection -- your application keeps running with zero downtime. No restart, no code changes, no instrumentation required.
Note: You must specify
--tool memrayexplicitly. The default Python profiling tool remains py-spy.
Requirements:
- Capabilities:
SYS_PTRACEandSYS_ADMINare required (for ptrace-based attach and nsenter into the target container's namespaces). Both are added automatically when--tool memrayis used -- no extra flags needed. - Python versions: 3.10, 3.11, 3.12, 3.13 (glibc-based images only)
- Not supported: Alpine/musl-based target containers, statically-linked Python builds
Output types:
| Output | Flag | Format | Description |
|---|---|---|---|
| Memory flamegraph | -o flamegraph |
HTML | Interactive flamegraph showing allocation call stacks and sizes |
| Allocation summary | -o summary |
Text | Tabular summary of the largest allocators by function |
Memory flamegraph (HTML):
kubectl prof mypod -t 1m -l python --tool memray -o flamegraph --local-path=/tmpThe output is a self-contained HTML file you can open in any browser. Wider frames indicate functions responsible for more memory allocations.
Allocation summary (text):
kubectl prof mypod -t 1m -l python --tool memray -o summary --local-path=/tmpThe output is a text file listing the top allocators by total bytes allocated.
Long profiling sessions and the heartbeat interval:
When profiling for longer durations (e.g. 5-10 minutes), network proxies or load balancers in front of your Kubernetes API server may terminate idle connections. Memray emits periodic heartbeat events to keep the log stream alive. The default interval is 30 seconds. You can adjust it with --heartbeat-interval:
kubectl prof mypod -t 10m -l python --tool memray -o flamegraph --heartbeat-interval=15sTargeting a specific process:
If your pod runs multiple Python processes, use --pid or --pgrep to target a specific one:
kubectl prof mypod -t 2m -l python --tool memray -o flamegraph --pid 1234
kubectl prof mypod -t 2m -l python --tool memray -o flamegraph --pgrep my-workerIf your Go application exposes the standard net/http/pprof endpoint, you can profile it directly without eBPF or any elevated privileges (HostPID, SYS_ADMIN, or privileged containers are not needed):
π‘ How it works: The agent pod connects to the target pod's
net/http/pprofHTTP endpoint over the network, downloads the binary profile (.pb.gz) and delivers it to your machine. No kernel-level access is required. Visualization is done locally withgo tool pprof.
# CPU profile β raw protobuf (.pb.gz), default
kubectl prof mypod -t 30s -l go --tool pprof
# Same result, explicit flag (-o raw and -o pprof are aliases)
kubectl prof mypod -t 30s -l go --tool pprof -o raw
kubectl prof mypod -t 30s -l go --tool pprof -o pprof
# Custom pprof port (default: 6060)
kubectl prof mypod -t 30s -l go --tool pprof --pprof-port 8080Open the result locally β the -http=: flag starts a browser UI with all views (flamegraph, graph, top, sourceβ¦):
go tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-raw-pprof-1-2026-04-21T08_48_33Z.pb.gzOr use the interactive CLI shell:
go tool pprof cpu.pb.gz
# then inside the pprof shell:
(pprof) top
(pprof) list MyFunc # annotated sourceCapture a snapshot of the heap allocations from /debug/pprof/heap:
kubectl prof mypod -l go --tool pprof -o heapdumpgo tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-heapdump-pprof-1-2026-04-21T08_48_33Z.outCapture the cumulative allocation profile from /debug/pprof/allocs (all allocations since the process started, not just live objects):
kubectl prof mypod -l go --tool pprof -o allocsdumpgo tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-allocsdump-pprof-1-2026-04-21T08_48_33Z.outπ‘ Heap vs Allocs:
/heapshows only live objects (useful for finding memory leaks), while/allocsshows all objects ever allocated (useful for finding allocation hot-spots and GC pressure).
Capture the current state of all goroutines from /debug/pprof/goroutine:
kubectl prof mypod -l go --tool pprof -o goroutinedumpgo tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-goroutinedump-pprof-1-2026-04-21T08_48_33Z.pb.gzAvailable output formats (pprof):
| Format | Flag | Extension | Endpoint | Notes |
|---|---|---|---|---|
| Raw protobuf | -o raw |
.pb.gz |
/debug/pprof/profile |
default |
| Pprof (alias) | -o pprof |
.pb.gz |
/debug/pprof/profile |
same as raw |
| Heap dump | -o heapdump |
.out |
/debug/pprof/heap |
|
| Allocs dump | -o allocsdump |
.out |
/debug/pprof/allocs |
|
| Goroutine dump | -o goroutinedump |
.pb.gz |
/debug/pprof/goroutine |
π‘ Visualize locally with a single command: open the downloaded
.pb.gzfile in your browser with all visualization options (flamegraph, top, source, graphβ¦):go tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-raw-pprof-1-2026-04-21T08_48_33Z.pb.gzThis starts a local HTTP server and opens the browser automatically. Navigate to View β Flame Graph for an interactive flamegraph.
The pprof profiler does not require any kernel privileges, but it does require network connectivity between the agent pod and the target pod. When both pods run in different namespaces (e.g. the target app in my-app and the profiling agent in profiling), any default-deny NetworkPolicy will block the connection.
Apply a NetworkPolicy in the target application's namespace to allow ingress from the profiling namespace:
# Allow ingress on the pprof port from the profiling namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-pprof-from-profiling
namespace: my-app # namespace where the target pod runs
spec:
podSelector: {} # applies to all pods in the namespace
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: profiling # the profiling agent namespace
ports:
- protocol: TCP
port: 6060 # default pprof port (adjust if using --pprof-port)If you want to restrict it to specific pods in the target namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-pprof-from-profiling
namespace: my-app
spec:
podSelector:
matchLabels:
app: my-go-service # only allow profiling of pods with this label
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: profiling
ports:
- protocol: TCP
port: 6060
β οΈ Note: The namespace labelkubernetes.io/metadata.nameis set automatically by Kubernetes 1.21+. For older clusters, add the label manually:kubectl label namespace profiling kubernetes.io/metadata.name=profiling.
Profile a Go application for 1 minute using eBPF (requires SYS_ADMIN or a privileged pod):
kubectl prof mypod -t 1m -l go -o flamegraphOutput formats (eBPF):
flamegraph- FlameGraph visualization (SVG)raw- Collapsed stack traces (.txt)
kubectl prof mypod -t 1m -l node -o flamegraphπ‘ Tip: For JavaScript symbols to be resolved, run your Node.js process with the
--perf-basic-profflag.
Generate a heap snapshot:
kubectl prof mypod -l node -o heapsnapshot
β οΈ Requirements: Your Node.js app must be run with--heapsnapshot-signal=SIGUSR2(default) or--heapsnapshot-signal=SIGUSR1.
If using SIGUSR1:
kubectl prof mypod -l node -o heapsnapshot --node-heap-snapshot-signal=10Heap snapshots can grow large for memory-heavy applications. Use --output-split-size to split the result into smaller chunks (default: 50M):
kubectl prof mypod -l node -o heapsnapshot --output-split-size=200Mπ Learn more: Node.js Heap Snapshots
Profile a Ruby application:
kubectl prof mypod -t 1m -l ruby -o flamegraphAvailable output formats:
flamegraph- FlameGraph visualizationspeedscope- SpeedScope formatcallgrind- Callgrind format
Profile a Rust application using cargo-flamegraph (default and recommended):
kubectl prof mypod -t 1m -l rust -o flamegraphkubectl-prof uses cargo-flamegraph as the default profiling tool for Rust applications, offering several advantages:
- π Rust-optimized profiling - Specifically designed for Rust applications with excellent symbol resolution
- π¨ Beautiful visualizations - Generates clean, colorized FlameGraphs with Rust-specific color palette
- β‘ Low overhead - Minimal performance impact during profiling
- π Deep insights - Captures detailed stack traces including inline functions and generics
- π οΈ Built on perf - Leverages the powerful Linux
perftool under the hood
Available output format:
flamegraph- Interactive FlameGraph visualization (SVG format)
Clang:
kubectl prof mypod -t 1m -l clang -o flamegraphClang++:
kubectl prof mypod -t 1m -l clang++ -o flamegraphProfile a PHP 7+ application using phpspy, a low-overhead sampling profiler:
kubectl prof mypod -t 1m -l php -o flamegraph --local-path=/tmpGenerate raw stack-trace data that can be post-processed into a FlameGraph:
kubectl prof mypod -t 1m -l php -o raw --local-path=/tmpAvailable output formats:
flamegraph- Interactive FlameGraph visualization (SVG format)raw- Raw stack traces in folded format
β οΈ Requirements: TheSYS_PTRACEcapability is required. It is added automatically bykubectl-prof.
π‘ Tip: phpspy works with PHP 7+ processes and requires no modifications to your application or PHP configuration.
kubectl-prof supports four specialised tools from the .NET diagnostics suite for profiling .NET Core / .NET 5+ applications running in Kubernetes.
β οΈ Requirements: The target container must be running a .NET Core / .NET 5+ application with the .NET diagnostic socket enabled (default behaviour).
dotnet-trace captures CPU samples and runtime events through the EventPipe mechanism. It is the default tool for .NET when no --tool flag is specified.
SpeedScope format (default):
kubectl prof mypod -t 30s -l dotnet -o speedscope --local-path=/tmpThe output is a .speedscope.json file that can be loaded directly at speedscope.app for interactive flame-graph analysis.
Raw nettrace format:
kubectl prof mypod -t 1m -l dotnet -o raw --local-path=/tmpThe output is a .nettrace binary file that can be opened with:
- PerfView on Windows
- Visual Studio on Windows
dotnet-trace convertCLI to convert it to other formats
Using --tool flag explicitly:
kubectl prof mypod -t 30s -l dotnet --tool dotnet-trace -o speedscope| Flag | Output file | Visualiser |
|---|---|---|
-o speedscope |
.speedscope.json |
speedscope.app |
-o raw |
.nettrace |
PerfView, Visual Studio, dotnet-trace convert |
dotnet-gcdump captures a snapshot of the managed (GC) heap. It is a lightweight alternative to a full memory dump β only managed objects are captured, so the file is much smaller than a .dmp.
kubectl prof mypod -l dotnet --tool dotnet-gcdump -o gcdump --local-path=/tmpFor large heaps, use --output-split-size to split the result into smaller chunks (default: 50M):
kubectl prof mypod -l dotnet --tool dotnet-gcdump -o gcdump --output-split-size=200M --local-path=/tmpπ‘ Tip:
dotnet-gcdumpis the recommended starting point for memory analysis. Usedotnet-dumponly when you need native frames or a complete memory picture.
The output is a .gcdump file that can be opened with:
- Visual Studio β Heap Snapshot view
- PerfView β GCDump viewer
- dotnet-gcdump report CLI for a quick text summary
Quick CLI report from the dump file:
dotnet-gcdump report ./agent-gcdump-<pid>-1.gcdumpdotnet-counters collects runtime and application performance metrics (CPU usage, GC collections, exception rates, thread-pool queue length, etc.) over a configurable duration and writes them to a JSON file.
kubectl prof mypod -t 30s -l dotnet --tool dotnet-counters -o counters --local-path=/tmpThe output is a .json file structured as a time series of counter values. It can be:
- Inspected directly β plain JSON, human-readable
- Visualised with PerfView β open the JSON report
- Post-processed with any standard JSON tooling (
jq, Python, etc.)
Example: print a quick summary with jq:
jq '.events[] | {name: .name, value: .value}' ./agent-counters-<pid>-1.jsonCounters captured by default (from the dotnet-common + dotnet-sampled-thread-time profiles):
| Counter | Description |
|---|---|
cpu-usage |
Total CPU usage (%) |
working-set |
Working set memory (MB) |
gc-heap-size |
GC heap size (MB) |
gen-0-gc-count |
Gen 0 GC collections / interval |
gen-1-gc-count |
Gen 1 GC collections / interval |
gen-2-gc-count |
Gen 2 GC collections / interval |
exception-count |
Exceptions thrown / interval |
threadpool-queue-length |
Thread-pool work-item queue length |
active-timer-count |
Active System.Threading.Timer instances |
dotnet-dump captures a point-in-time full memory dump (.dmp) of the process, including both managed and native frames. This is the most comprehensive diagnostic artefact β use it for crash analysis, deadlock investigation, or when dotnet-gcdump does not capture enough context.
β οΈ Note:dotnet-dumpdoes not accept a--durationflag β it captures the dump immediately when invoked. The-tflag is ignored for this tool.
kubectl prof mypod -l dotnet --tool dotnet-dump -o dump --local-path=/tmpFull memory dumps can be very large (several GB for production processes). Use --output-split-size to split the result into smaller chunks for easier transfer (default: 50M):
kubectl prof mypod -l dotnet --tool dotnet-dump -o dump --output-split-size=500M --local-path=/tmpThe output is a .dmp file (ELF core dump format on Linux) that can be analysed with:
-
dotnet-dump analyzeβ cross-platform interactive SOS shell:dotnet-dump analyze ./agent-dump-<pid>-1.dmp
Useful SOS commands inside the session:
> clrstack # managed call stacks for all threads > dumpheap -stat # managed heap statistics > gcroot <address> # find GC roots for an object > threads # list all threads > pe # print last exception on each thread -
Visual Studio on Windows β open the
.dmpfile for mixed managed/native debugging -
WinDbg with the SOS extension on Windows
-
LLDB with the SOS plugin on Linux/macOS:
lldb --core ./agent-dump-<pid>-1.dmp
| Tool flag | -o / Output type |
Output file | Default? | Visualiser / Tool |
|---|---|---|---|---|
dotnet-trace (default) |
speedscope |
.speedscope.json |
β | speedscope.app |
dotnet-trace |
raw |
.nettrace |
PerfView, Visual Studio, dotnet-trace convert |
|
dotnet-gcdump |
gcdump |
.gcdump |
Visual Studio, PerfView, dotnet-gcdump report |
|
dotnet-counters |
counters |
.json |
PerfView, jq, Python |
|
dotnet-dump |
dump |
.dmp |
dotnet-dump analyze, Visual Studio, WinDbg, LLDB |
- .NET diagnostics documentation
dotnet-tracereferencedotnet-gcdumpreferencedotnet-countersreferencedotnet-dumpreference- Well-known counters in .NET
kubectl prof mypod -t 1m -l java --runtime crioSupported runtimes: containerd (default), crio
Profile continuously at 60-second intervals for 5 minutes:
kubectl prof mypod -l java -t 5m --interval 60sπ Note: In continuous mode, a new result is produced every interval. Only the last result is available by default.
Set CPU and memory limits for the profiling agent pod:
kubectl prof mypod -l java -t 5m \
--cpu-limits=1 \
--cpu-requests=100m \
--mem-limits=200Mi \
--mem-requests=100MiProfile a pod in a different namespace:
kubectl prof mypod -n profiling \
--service-account=profiler \
--target-namespace=my-apps \
-l goUse a custom profiling agent image:
kubectl prof mypod -l java -t 5m \
--image=localhost/my-agent-image-jvm:latest \
--image-pull-policy=IfNotPresent \
--runtime containerdProfile all pods matching a label selector:
kubectl prof --selector app=myapp -t 5m -l java -o jfr
β οΈ ATTENTION: Use this option with caution as it will profile ALL pods matching the selector.
Control concurrent profiling jobs:
kubectl prof --selector app=myapp -t 5m -l java -o jfr --pool-size-profiling-jobs 5By default, kubectl-prof attempts to profile all processes in the container. To target a specific process:
Using PID:
kubectl prof mypod -l java --pid 1234Using process name:
kubectl prof mypod -l java --pgrep java-app-processFor Java profiling, kubectl-prof uses PERFMON and SYSLOG capabilities by default. To use SYS_ADMIN:
kubectl prof my-pod -t 5m -l java --capabilities=SYS_ADMINAdd multiple capabilities:
kubectl prof my-pod -t 5m -l java \
--capabilities=SYS_ADMIN \
--capabilities=PERFMONProfile pods on nodes with taints by specifying tolerations:
Tolerate specific taint:
kubectl prof my-pod -t 5m -l java \
--tolerations=node.kubernetes.io/disk-pressure=true:NoScheduleMultiple tolerations:
kubectl prof my-pod -t 5m -l java \
--tolerations=node.kubernetes.io/disk-pressure=true:NoSchedule \
--tolerations=node.kubernetes.io/memory-pressure:NoExecute \
--tolerations=dedicated=profiling:PreferNoScheduleToleration formats:
key=value:effect- Full specificationkey:effect- Any valuekey- Defaults to NoSchedule
For a complete list of options:
kubectl prof --helpKrew is the plugin manager for kubectl.
-
Install Krew (if not already installed)
-
Add kubectl-prof repository and install:
kubectl krew index add kubectl-prof https://github.com/josepdcs/kubectl-prof
kubectl krew search kubectl-prof
kubectl krew install kubectl-prof/prof
kubectl prof --helpDownload pre-built binaries from the releases page.
wget https://github.com/josepdcs/kubectl-prof/releases/download/2.2.0/kubectl-prof_2.2.0_linux_amd64.tar.gz
tar xvfz kubectl-prof_2.2.0_linux_amd64.tar.gz
sudo install kubectl-prof /usr/local/bin/wget https://github.com/josepdcs/kubectl-prof/releases/download/2.2.0/kubectl-prof_2.2.0_darwin_amd64.tar.gz
tar xvfz kubectl-prof_2.2.0_darwin_amd64.tar.gz
sudo install kubectl-prof /usr/local/bin/Download the Windows binary from the releases page and add it to your PATH.
- Go 1.26 or higher
- Make
- Docker (for building agent containers)
- Clone and install dependencies:
go get -d github.com/josepdcs/kubectl-prof
cd $GOPATH/src/github.com/josepdcs/kubectl-prof
make install-deps- Build the binary:
make buildThe binary will be available in ./bin/kubectl-prof
- Build agent containers (optional):
Modify the DOCKER_BASE_IMAGE property in Makefile, then run:
make build-docker-agentskubectl-prof launches a Kubernetes Job on the same node as the target pod. The profiling is performed using specialized tools based on the programming language:
async-profiler - For FlameGraphs and JFR files
- FlameGraphs:
--tool async-profiler -o flamegraph(default) - JFR files:
--tool async-profiler -o jfr - Collapsed/Raw:
--tool async-profiler -o collapsedor-o raw - Event types:
cpu,alloc,lock,cache-misses,wall,itimer,ctimer(default)
jcmd - For JFR, thread dumps, heap dumps
- JFR files:
--tool jcmd -o jfr(default for jcmd) - Thread dumps:
--tool jcmd -o threaddump - Heap dumps:
--tool jcmd -o heapdump - Heap histogram:
--tool jcmd -o heaphistogram
py-spy - Low-overhead Python profiler
- FlameGraphs:
-o flamegraph(default) - Thread dumps:
-o threaddump - SpeedScope:
-o speedscope - Raw output:
-o raw
memray - Python memory profiler (--tool memray)
- Memory flamegraph (HTML):
-o flamegraph - Allocation summary (text):
-o summary - Attaches to running processes via GDB injection (zero downtime)
- Requires
SYS_PTRACE+SYS_ADMINcapabilities (added automatically) - Supported target Python versions: 3.10, 3.11, 3.12, 3.13 (glibc-based only)
pprof - Native Go HTTP profiling (no privileges required)
- Connects directly to the application's
net/http/pprofendpoint over HTTP - No
HostPID,SYS_ADMIN, or privileged access β only needs network connectivity to the target pod - The binary profile (
.pb.gz) is delivered to your machine; visualization is done locally withgo tool pprof - Cross-namespace use requires a
NetworkPolicyallowing ingress on the pprof port from the profiling namespace - Usage:
--tool pprof - Custom port:
--pprof-port <port>(default:6060)
Output formats (pprof) β all produce .pb.gz compatible with go tool pprof:
| Format | Flag | Endpoint queried | Notes |
|---|---|---|---|
| Raw protobuf | -o raw |
/debug/pprof/profile |
default, CPU profile |
| Pprof (alias) | -o pprof |
/debug/pprof/profile |
same as raw |
| Heap dump | -o heapdump |
/debug/pprof/heap |
memory allocations, .out |
| Allocs dump | -o allocsdump |
/debug/pprof/allocs |
cumulative allocations, .out |
| Goroutine dump | -o goroutinedump |
/debug/pprof/goroutine |
goroutine state |
π‘ Visualize locally with a single command: open the downloaded
.pb.gzfile in your browser with all visualization options (flamegraph, top, source, graphβ¦):go tool pprof -http=: golang-deployment-86f57ddb4-h9fvz-agent-raw-pprof-1-2026-04-21T08_48_33Z.pb.gzThis starts a local HTTP server and opens the browser automatically. Navigate to View β Flame Graph for an interactive flamegraph.
Examples:
kubectl prof my-pod -t 30s -l go --tool pprof # CPU profile (default)
kubectl prof my-pod -t 30s -l go --tool pprof -o raw # CPU profile, explicit
kubectl prof my-pod -t 30s -l go --tool pprof -o pprof # CPU profile alias
kubectl prof my-pod -l go --tool pprof -o heapdump # heap snapshot (live objects)
kubectl prof my-pod -l go --tool pprof -o allocsdump # cumulative allocation profile
kubectl prof my-pod -l go --tool pprof -o goroutinedump # goroutine dump
kubectl prof my-pod -t 30s -l go --tool pprof --pprof-port 8080 # custom porteBPF Profiling - Two options available (require SYS_ADMIN / privileged pod):
-
BPF (default) - BCC-based profiler
- Uses BCC tools with runtime compilation
- Requires kernel headers (
/lib/modules) - Usage: No
--toolflag needed (default)
-
BTF - CO-RE eBPF profiler
- Uses libbpf-tools with CO-RE support
- No kernel headers required - only needs BTF (available on modern kernels)
- Usage: Add
--tool btfflag - Example:
kubectl prof my-pod -t 1m -l go --tool btf
Output formats (eBPF tools):
- FlameGraphs:
-o flamegraph(default) - Raw output:
-o raw
cargo-flamegraph - Rust-optimized profiling tool (default)
- FlameGraphs:
--tool cargo-flamegraph -o flamegraph(default) - Rust-specific color palette and symbol resolution
- Low overhead, built on perf
rbspy - Ruby sampling profiler
- FlameGraphs:
-o flamegraph(default) - SpeedScope:
-o speedscope - Callgrind:
-o callgrind
phpspy - Low-overhead sampling profiler for PHP 7+
- FlameGraphs:
-o flamegraph(default) - Raw output:
-o raw
Output formats:
flamegraph- Interactive FlameGraph visualization (SVG format)raw- Raw stack traces in folded format
Four tools from the .NET diagnostics suite are available, each targeting a different diagnostic scenario:
dotnet-trace β CPU and runtime event tracing (default tool for .NET)
- SpeedScope:
--tool dotnet-trace -o speedscope(default) β.speedscope.json - Raw nettrace:
--tool dotnet-trace -o rawβ.nettrace - Uses EventPipe; zero JVM-agent overhead
dotnet-gcdump β Lightweight GC heap snapshot
- GC heap dump:
--tool dotnet-gcdump -o gcdumpβ.gcdump - Captures managed objects only; much smaller than a full dump
dotnet-counters β Real-time performance counter collection
- Counters:
--tool dotnet-counters -o countersβ.json - Captures CPU, GC, thread-pool, exception rate and other runtime metrics
dotnet-dump β Full process memory dump
- Full dump:
--tool dotnet-dump -o dumpβ.dmp - Point-in-time; includes both managed and native frames
- Analysable with
dotnet-dump analyze, Visual Studio, WinDbg, LLDB+SOS
eBPF Profiling - Two options available (recommended):
-
BPF (default) - BCC-based profiler
- Requires kernel headers (
/lib/modules) - Usage: No
--toolflag needed (default)
- Requires kernel headers (
-
BTF - CO-RE eBPF profiler
- No kernel headers required - only needs BTF
- Usage: Add
--tool btfflag - Example:
kubectl prof my-pod -t 1m -l node --tool btf
Alternative: perf
- Available for fallback if eBPF profiling unavailable
Output formats:
- FlameGraphs:
-o flamegraph(default) - Raw output:
-o raw - Heap snapshot:
-o heapsnapshot
π‘ Tip: For JavaScript symbol resolution, run Node.js with
--perf-basic-profflag
π‘ Tip: For heap snapshots, run Node.js with--heapsnapshot-signalflag
eBPF Profiling - Two options available (recommended):
-
BPF (default) - BCC-based profiler
- Requires kernel headers (
/lib/modules) - Usage: No
--toolflag needed (default)
- Requires kernel headers (
-
BTF - CO-RE eBPF profiler
- No kernel headers required - only needs BTF
- Usage: Add
--tool btfflag - Example:
kubectl prof my-pod -t 1m -l clang --tool btf
Alternative: perf
- Available for fallback if eBPF profiling unavailable
Output formats:
- FlameGraphs:
-o flamegraph - Raw output:
-o raw
The raw output is a text file containing profiling data that can be:
- Used to generate FlameGraphs manually
- Visualized at speedscope.app
Discrete Mode (default)
- Single profiling session
- Result available when profiling completes
- Usage:
-t 5m
Continuous Mode
- Multiple results at regular intervals
- Only the last result is available by default
- Client responsible for storing all results
- Usage:
-t 5m --interval 60s
By default, kubectl-prof profiles all processes in the target container matching the specified language.
Warning example:
β Detected more than one PID to profile: [2508 2509].
It will attempt to profile all of them.
Use the --pid flag to profile a specific PID.
Target a specific process:
- By PID:
--pid 1234 - By name:
--pgrep process-name
For Java profiling, kubectl-prof uses PERFMON and SYSLOG capabilities by default.
According to the Kernel documentation, these capabilities should be sufficient for collecting performance samples.
To use SYS_ADMIN instead:
kubectl prof my-pod -t 5m -l java --capabilities=SYS_ADMINAdd multiple capabilities:
kubectl prof my-pod -t 5m -l java \
--capabilities=SYS_ADMIN \
--capabilities=PERFMONBy default, the profiling agent pod is scheduled only on nodes without taints. For nodes with taints, specify tolerations:
Toleration formats:
key=value:effect- Full specificationkey:effect- Any valuekey- Defaults to NoSchedule
Examples:
# Single toleration
kubectl prof my-pod -t 5m -l java \
--tolerations=node.kubernetes.io/disk-pressure=true:NoSchedule
# Multiple tolerations
kubectl prof my-pod -t 5m -l java \
--tolerations=node.kubernetes.io/disk-pressure=true:NoSchedule \
--tolerations=node.kubernetes.io/memory-pressure:NoExecute \
--tolerations=dedicated=profiling:PreferNoScheduleWe welcome contributions! Please refer to Contributing.md for information about how to get involved.
We welcome:
- π Bug reports
- π‘ Feature requests
- π Documentation improvements
- π§ Pull requests
- Josep DamiΓ Carbonell SeguΓ - josepdcs@gmail.com
Original author of kubectl-flame:
- Eden Federman - efederman@verizonmedia.com
- Verizon Media Code
This project is licensed under the terms of the Apache 2.0 open source license. Please refer to LICENSE for the full terms.