diff --git a/.claude/plugins/collector-dev/.claude-plugin/plugin.json b/.claude/plugins/collector-dev/.claude-plugin/plugin.json new file mode 100644 index 0000000000..0d38f68edb --- /dev/null +++ b/.claude/plugins/collector-dev/.claude-plugin/plugin.json @@ -0,0 +1,9 @@ +{ + "name": "collector-dev", + "description": "Collector development workflows — build, test, CI status, and PR management", + "version": "1.0.0", + "author": { + "name": "RHACS Collector Team" + }, + "repository": "https://github.com/stackrox/collector" +} diff --git a/.claude/plugins/collector-dev/.mcp.json b/.claude/plugins/collector-dev/.mcp.json new file mode 100644 index 0000000000..209568c490 --- /dev/null +++ b/.claude/plugins/collector-dev/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "github": { + "type": "http", + "url": "https://api.githubcopilot.com/mcp/" + } + } +} diff --git a/.claude/plugins/collector-dev/skills/build/SKILL.md b/.claude/plugins/collector-dev/skills/build/SKILL.md new file mode 100644 index 0000000000..ef2080aeaa --- /dev/null +++ b/.claude/plugins/collector-dev/skills/build/SKILL.md @@ -0,0 +1,44 @@ +--- +name: build +description: Build collector binary with options (debug, asan, tsan, clean) +allowed-tools: Bash(cmake *), Bash(make *), Bash(nproc), Bash(git describe *), Bash(strip *), Read, Glob +--- + +# Build Collector + +Build the collector binary. Supports optional arguments: +- `debug` — Debug build with symbols +- `asan` — AddressSanitizer build +- `tsan` — ThreadSanitizer build +- `clean` — Clean build directory first +- (no args) — Release build + +## Steps + +1. Determine build environment: + - If inside the devcontainer (check: `DEVCONTAINER=true` env var), run cmake directly. + - If on the host (macOS), use `make start-builder && make collector`. + +2. If `clean` argument is provided, remove `cmake-build/` directory first. + +3. Set build variables based on arguments: + - `debug`: `CMAKE_BUILD_TYPE=Debug` + - `asan`: `CMAKE_BUILD_TYPE=Debug`, `ADDRESS_SANITIZER=ON` + - `tsan`: `CMAKE_BUILD_TYPE=Debug`, `THREAD_SANITIZER=ON` + - default: `CMAKE_BUILD_TYPE=Release` + +4. Run cmake configure (if `cmake-build/` doesn't exist or CMakeLists.txt changed): + ```bash + cmake -S . -B cmake-build \ + -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE \ + -DADDRESS_SANITIZER=$ADDRESS_SANITIZER \ + -DTHREAD_SANITIZER=$THREAD_SANITIZER \ + -DCOLLECTOR_VERSION=$(git describe --tags --abbrev=10 --long) + ``` + +5. Run cmake build: + ```bash + cmake --build cmake-build -- -j$(nproc) + ``` + +6. Report result: success with binary size, or failure with the first error and its file:line. diff --git a/.claude/plugins/collector-dev/skills/ci-status/SKILL.md b/.claude/plugins/collector-dev/skills/ci-status/SKILL.md new file mode 100644 index 0000000000..b9576aebe9 --- /dev/null +++ b/.claude/plugins/collector-dev/skills/ci-status/SKILL.md @@ -0,0 +1,34 @@ +--- +name: ci-status +description: Check CI status on current PR, fetch failure logs, diagnose issues +allowed-tools: Bash(git branch *), Bash(git log *), mcp__github__search_pull_requests, mcp__github__pull_request_read, mcp__github__actions_list, mcp__github__actions_get, mcp__github__get_job_logs, Read +--- + +# CI Status + +Check CI pipeline status for the current branch/PR and diagnose failures. + +## Steps + +1. Get the current branch name from git. + +2. Use `mcp__github__search_pull_requests` to find an open PR for this branch + in `stackrox/collector`. + +3. If a PR exists, use `mcp__github__pull_request_read` to get its check status. + +4. Use `mcp__github__actions_list` to get workflow runs for the branch. + +5. For any **failed runs**: + - Use `mcp__github__actions_get` to get the run details + - Use `mcp__github__get_job_logs` to fetch failure logs + - Identify which workflow failed (unit-tests, integration-tests, k8s-integration-tests, lint) + - For integration test failures, identify which VM type and test suite failed + +6. **Diagnose** the failure: + - Unit test failure: show the failing assertion and relevant source file + - Integration test failure: distinguish infra issues (VM creation, timeout) from test failures + - Lint failure: show which files need formatting + - Build failure: show the compiler error with file:line + +7. **Suggest next steps**: what code changes would fix the failure, or note if it's flaky/infra. diff --git a/.claude/plugins/collector-dev/skills/iterate/SKILL.md b/.claude/plugins/collector-dev/skills/iterate/SKILL.md new file mode 100644 index 0000000000..3aaf884f48 --- /dev/null +++ b/.claude/plugins/collector-dev/skills/iterate/SKILL.md @@ -0,0 +1,42 @@ +--- +name: iterate +description: Full development cycle — build, unit test, format check, commit, push to existing branch +allowed-tools: Bash(cmake *), Bash(make *), Bash(ctest *), Bash(nproc), Bash(git *), Bash(clang-format *), Read, Write, Edit, Glob, Grep, mcp__github__pull_request_read, mcp__github__actions_list, mcp__github__actions_get, mcp__github__get_job_logs +--- + +# Iterate + +Run the full development inner loop. The branch and PR already exist — just build, test, and push. +Stops at the first failure. + +## Steps + +1. **Build** the collector: + - Detect environment (devcontainer vs host) + - In devcontainer: `cmake -S . -B cmake-build -DCMAKE_BUILD_TYPE=Release -DCOLLECTOR_VERSION=$(git describe --tags --abbrev=10 --long) && cmake --build cmake-build -- -j$(nproc)` + - On host: `make collector` + - **Stop on failure** — report the compiler error with file:line. + +2. **Unit test**: + - In devcontainer: `ctest --no-tests=error -V --test-dir cmake-build` + - On host: `make unittest` + - **Stop on failure** — report which test failed and the assertion. + +3. **Format check** (C++ files changed in this branch only): + - Get changed C++ files: `git diff --name-only origin/master...HEAD | grep -E '\.(cpp|h)$'` + - Run: `clang-format --style=file -n --Werror ` + - If formatting issues found, auto-fix them: `clang-format --style=file -i ` + - Report what was fixed. + +4. **Commit**: + - Stage changed files (source + any format fixes) + - Create a commit with a descriptive message summarizing the changes + +5. **Push**: + - `git push` to the existing branch (branch and PR already created by run.sh) + - Do NOT create new branches or PRs + +6. **Check CI**: + - Use `mcp__github__actions_list` to see if CI has started + - Report the PR URL and note that CI is running + - Use `/collector-dev:ci-status` for detailed CI results once checks complete diff --git a/.claude/plugins/collector-dev/skills/task/SKILL.md b/.claude/plugins/collector-dev/skills/task/SKILL.md new file mode 100644 index 0000000000..bda662fc99 --- /dev/null +++ b/.claude/plugins/collector-dev/skills/task/SKILL.md @@ -0,0 +1,82 @@ +--- +name: task +description: End-to-end autonomous workflow — implement a task, push, monitor CI, fix failures until green +disable-model-invocation: true +allowed-tools: Bash(cmake *), Bash(make *), Bash(ctest *), Bash(nproc), Bash(git *), Bash(clang-format *), Bash(sleep *), Read, Write, Edit, Glob, Grep, Agent, mcp__github__pull_request_read, mcp__github__search_pull_requests, mcp__github__actions_list, mcp__github__actions_get, mcp__github__get_job_logs +--- + +# Task + +Complete a development task end-to-end: implement, build, test, push, and monitor CI until all checks pass. + +## Input + +The task description is provided via $ARGUMENTS or in the initial prompt context (branch name, PR URL, task). + +## Workflow + +### Phase 1: Implement + +1. Read and understand the task +2. Explore relevant code in the repository +3. Implement the changes +4. Build the collector: + - In devcontainer: `cmake -S . -B cmake-build -DCMAKE_BUILD_TYPE=Release -DCOLLECTOR_VERSION=$(git describe --tags --abbrev=10 --long) && cmake --build cmake-build -- -j$(nproc)` + - On host: `make collector` + - If build fails, fix and retry +5. Run unit tests: + - In devcontainer: `ctest --no-tests=error -V --test-dir cmake-build` + - On host: `make unittest` + - If tests fail, fix and retry +6. Format check: + - `git diff --name-only origin/master...HEAD | grep -E '\.(cpp|h)$'` to find changed files + - `clang-format --style=file -i ` to fix formatting +7. Commit and push: + - `git add` the changed files + - `git commit` with a descriptive message + - `git push` + +### Phase 2: Monitor CI + +After pushing, enter a monitoring loop. CI typically takes 30-90 minutes. + +**Loop** (repeat until all checks pass or blocked): + +1. Wait 10 minutes: `sleep 600` +2. Check CI status: + - Get current branch: `git branch --show-current` + - Use `mcp__github__search_pull_requests` to find the PR + - Use `mcp__github__actions_list` to get workflow runs + - Use `mcp__github__pull_request_read` for check status + +3. Evaluate: + + **All checks passed** → report success and stop + + **Checks still running** → report progress ("X of Y complete"), continue loop + + **Checks failed** → + - Use `mcp__github__actions_get` and `mcp__github__get_job_logs` to get failure logs + - Diagnose the failure: + - Build failure: read error, fix code + - Unit test failure: read assertion, fix code + - Lint failure: run clang-format + - Integration test infra flake (VM timeout, network): report as flake, continue loop + - Integration test real failure: analyze and fix code + - If fixable: fix → build → unit test → commit → push → continue loop + - If not fixable: report diagnosis and stop + +4. Safety limits: + - Maximum 6 CI cycles (about 3 hours of monitoring) + - If exceeded, report status and stop + +### Completion + +End with a summary: +``` +STATUS: PASSED | BLOCKED | TIMEOUT +Branch: claude/agent-xxx +PR: +Cycles: N +Changes: list of files modified +``` diff --git a/.claude/plugins/collector-dev/skills/watch-ci/SKILL.md b/.claude/plugins/collector-dev/skills/watch-ci/SKILL.md new file mode 100644 index 0000000000..e73b843e14 --- /dev/null +++ b/.claude/plugins/collector-dev/skills/watch-ci/SKILL.md @@ -0,0 +1,54 @@ +--- +name: watch-ci +description: Check CI status and react to failures — diagnose, fix, rebuild, push. Designed to run in a loop. +allowed-tools: Bash(cmake *), Bash(make *), Bash(ctest *), Bash(nproc), Bash(git *), Bash(clang-format *), Read, Write, Edit, Glob, Grep, mcp__github__pull_request_read, mcp__github__search_pull_requests, mcp__github__actions_list, mcp__github__actions_get, mcp__github__get_job_logs +--- + +# Watch CI + +Monitor CI for the current branch's PR and react to failures. Designed to be run +with `/loop 30m /collector-dev:watch-ci`. + +## Steps + +1. **Find the PR** for the current branch: + - Get branch name: `git branch --show-current` + - Use `mcp__github__search_pull_requests` to find the open PR in `stackrox/collector` + - If no PR found, report and stop + +2. **Check CI status**: + - Use `mcp__github__pull_request_read` to get check status + - Use `mcp__github__actions_list` to get workflow runs + +3. **Evaluate state and act**: + + **If all checks pass:** + - Report: "All CI checks passed. PR is ready for review." + - Stop — no further action needed + + **If checks are still running:** + - Report: "CI still running (X of Y checks complete). Will check again next loop." + - Stop — wait for next loop iteration + + **If checks failed:** + - Use `mcp__github__actions_get` and `mcp__github__get_job_logs` to get failure details + - Identify the failure type: + - **Build failure**: read compiler error, find the file:line, fix the code + - **Unit test failure**: read the assertion, find the test and source, fix the code + - **Integration test failure**: determine if it's a real failure or infra flake + - If infra flake (VM creation timeout, network issue): report and skip + - If real test failure: analyze the test expectation vs actual, fix the code + - **Lint failure**: run `clang-format --style=file -i` on the affected files + - After fixing: + - Build: `cmake --build cmake-build -- -j$(nproc)` + - Unit test: `ctest --no-tests=error -V --test-dir cmake-build` + - If build+test pass: `git add`, `git commit`, `git push` + - Report what was fixed and that a new CI run should start + - If the failure can't be fixed automatically, report the diagnosis and stop + +4. **Summary**: always end with a clear status line: + - `PASSED` — all checks green + - `PENDING` — checks still running, will retry + - `FIXED` — failure diagnosed and fix pushed, awaiting new CI run + - `FLAKE` — infra failure, not a code issue + - `BLOCKED` — failure requires human intervention diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 0000000000..f9820826aa --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,12 @@ +{ + "permissions": { + "deny": [ + "Read(.devcontainer/**)", + "mcp__github__merge_pull_request", + "mcp__github__delete_file", + "mcp__github__fork_repository", + "mcp__github__create_repository", + "mcp__github__actions_run_trigger" + ] + } +} diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile new file mode 100644 index 0000000000..d5707e8ee9 --- /dev/null +++ b/.devcontainer/Dockerfile @@ -0,0 +1,80 @@ +# Collector development container +# Based on the collector-builder image which has all C++ dependencies pre-installed. +# Adds Claude Code, Go, and developer tooling for agent-driven development. +# +# Build environment: CentOS Stream 10 with clang, llvm, cmake, grpc, protobuf, +# libbpf, bpftool, and all other collector dependencies. + +ARG COLLECTOR_BUILDER_TAG=master +FROM quay.io/stackrox-io/collector-builder:${COLLECTOR_BUILDER_TAG} + +# Install developer tooling not in the builder image +# Note: git, findutils, which, openssh-clients already in builder +# bubblewrap: Claude Code uses this for built-in command sandboxing +RUN dnf install -y \ + bubblewrap \ + jq \ + socat \ + zsh \ + procps-ng \ + sudo \ + python3-pip \ + iptables \ + ipset \ + && dnf clean all + +# Determine architecture strings used by various download URLs +# uname -m gives aarch64 or x86_64 +# Go uses arm64/amd64, ripgrep/fd use aarch64/x86_64 +RUN ARCH=$(uname -m) \ + && GOARCH=$([ "$ARCH" = "aarch64" ] && echo "arm64" || echo "amd64") \ + # Install Go + && curl -fsSL "https://go.dev/dl/go1.23.6.linux-${GOARCH}.tar.gz" | tar -C /usr/local -xzf - \ + # Install ripgrep + && curl -fsSL "https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-${ARCH}-unknown-linux-gnu.tar.gz" \ + | tar -xzf - --strip-components=1 -C /usr/local/bin "ripgrep-14.1.1-${ARCH}-unknown-linux-gnu/rg" \ + # Install fd + && curl -fsSL "https://github.com/sharkdp/fd/releases/download/v10.2.0/fd-v10.2.0-${ARCH}-unknown-linux-gnu.tar.gz" \ + | tar -xzf - --strip-components=1 -C /usr/local/bin "fd-v10.2.0-${ARCH}-unknown-linux-gnu/fd" + +ENV PATH="/usr/local/go/bin:${PATH}" +ENV GOPATH="/home/dev/go" +ENV PATH="${GOPATH}/bin:${PATH}" + +# Install Node.js (needed for Claude Code) +ARG NODE_VERSION=22 +RUN curl -fsSL https://rpm.nodesource.com/setup_${NODE_VERSION}.x | bash - \ + && dnf install -y nodejs \ + && dnf clean all + +# Install Claude Code +RUN npm install -g @anthropic-ai/claude-code + +# Install gcloud CLI (for Vertex AI auth and GCP VM management) +RUN curl -fsSL https://sdk.cloud.google.com > /tmp/install-gcloud.sh \ + && bash /tmp/install-gcloud.sh --disable-prompts --install-dir=/opt \ + && rm /tmp/install-gcloud.sh +ENV PATH="/opt/google-cloud-sdk/bin:${PATH}" + +# Pull GitHub MCP server image (used by Claude Code for GitHub operations) +# Configured in .claude/settings.json as an MCP server + +# Create non-root dev user with passwordless sudo +RUN useradd -m -s /bin/zsh dev \ + && echo "dev ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/dev + +# Install ansible for VM-based testing (optional, lightweight) +RUN pip3 install ansible-core + +# Firewall script for network isolation (optional, used with --dangerously-skip-permissions) +COPY init-firewall.sh /usr/local/bin/init-firewall.sh +RUN chmod +x /usr/local/bin/init-firewall.sh + +USER dev +WORKDIR /workspace + +# Persist shell history and Claude state across rebuilds (volumes in devcontainer.json) +ENV HISTFILE=/home/dev/.commandhistory/.zsh_history + +ENV SHELL=/bin/zsh +ENV DEVCONTAINER=true diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json new file mode 100644 index 0000000000..29c30ad638 --- /dev/null +++ b/.devcontainer/devcontainer.json @@ -0,0 +1,69 @@ +{ + "name": "collector-dev", + "build": { + "dockerfile": "Dockerfile", + "args": { + "COLLECTOR_BUILDER_TAG": "master" + } + }, + "containerUser": "dev", + "workspaceMount": "source=${localWorkspaceFolder},target=/workspace,type=bind,consistency=delegated", + "workspaceFolder": "/workspace", + + "mounts": [ + "source=collector-dev-history,target=/home/dev/.commandhistory,type=volume", + "source=collector-dev-claude,target=/home/dev/.claude,type=volume", + "source=${localEnv:HOME}/.gitconfig,target=/home/dev/.gitconfig,type=bind,readonly", + "source=${localEnv:HOME}/.config/gcloud,target=/home/dev/.config/gcloud,type=bind,readonly", + "source=${localEnv:HOME}/.ssh,target=/home/dev/.ssh,type=bind,readonly", + "source=${localWorkspaceFolder}/.devcontainer,target=/workspace/.devcontainer,type=bind,readonly" + ], + + "runArgs": [ + "--cap-add=SYS_PTRACE", + "--cap-add=NET_ADMIN", + "--cap-add=NET_RAW", + "--init" + ], + + "customizations": { + "vscode": { + "extensions": [ + "llvm-vs-code-extensions.vscode-clangd", + "ms-vscode.cmake-tools", + "golang.go", + "ms-python.python" + ], + "settings": { + "cmake.sourceDirectory": "${workspaceFolder}", + "cmake.buildDirectory": "${workspaceFolder}/cmake-build/${buildType}", + "clangd.path": "/usr/bin/clangd", + "files.associations": { + "*.bpf.c": "c", + "*.skel.h": "c" + } + } + } + }, + + "postCreateCommand": "/usr/local/bin/init-firewall.sh || true", + + "containerEnv": { + "DEVCONTAINER": "true", + "NPM_CONFIG_IGNORE_SCRIPTS": "true", + "NPM_CONFIG_AUDIT": "true", + "NPM_CONFIG_FUND": "false", + "PYTHONDONTWRITEBYTECODE": "1" + }, + + "remoteEnv": { + "COLLECTOR_BUILDER_TAG": "master", + "CMAKE_BUILD_TYPE": "Release", + "CLOUDSDK_CONFIG": "/home/dev/.config/gcloud", + "GOOGLE_APPLICATION_CREDENTIALS": "/home/dev/.config/gcloud/application_default_credentials.json", + "CLAUDE_CODE_USE_VERTEX": "${localEnv:CLAUDE_CODE_USE_VERTEX}", + "GOOGLE_CLOUD_PROJECT": "${localEnv:GOOGLE_CLOUD_PROJECT}", + "GOOGLE_CLOUD_LOCATION": "${localEnv:GOOGLE_CLOUD_LOCATION}", + "ANTHROPIC_VERTEX_PROJECT_ID": "${localEnv:ANTHROPIC_VERTEX_PROJECT_ID}" + } +} diff --git a/.devcontainer/init-firewall.sh b/.devcontainer/init-firewall.sh new file mode 100644 index 0000000000..391088536b --- /dev/null +++ b/.devcontainer/init-firewall.sh @@ -0,0 +1,75 @@ +#!/usr/bin/env bash +# Optional network firewall for use with --dangerously-skip-permissions mode. +# Restricts outbound traffic to only necessary services. +# Requires: --cap-add=NET_ADMIN on the container (not set by default). +# To enable: add "--cap-add=NET_ADMIN" to runArgs in devcontainer.json. + +set -euo pipefail + +if ! command -v iptables &>/dev/null; then + echo "iptables not available, skipping firewall setup" + exit 0 +fi + +if ! iptables -L &>/dev/null 2>&1; then + echo "No NET_ADMIN capability, skipping firewall setup" + exit 0 +fi + +echo "Configuring network firewall..." + +# Allow loopback +iptables -A OUTPUT -o lo -j ACCEPT + +# Allow established connections +iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT + +# Allow DNS +iptables -A OUTPUT -p udp --dport 53 -j ACCEPT +iptables -A OUTPUT -p tcp --dport 53 -j ACCEPT + +# Allow GCP / Vertex AI (Claude Code backend + gcloud CLI + VM management) +# Vertex AI endpoints: https://{REGION}-aiplatform.googleapis.com +iptables -A OUTPUT -d oauth2.googleapis.com -j ACCEPT +iptables -A OUTPUT -d accounts.google.com -j ACCEPT +iptables -A OUTPUT -d www.googleapis.com -j ACCEPT +iptables -A OUTPUT -d storage.googleapis.com -j ACCEPT +iptables -A OUTPUT -d compute.googleapis.com -j ACCEPT +iptables -A OUTPUT -d cloudresourcemanager.googleapis.com -j ACCEPT +# Vertex AI regions (allow all *.googleapis.com via port 443) +iptables -A OUTPUT -p tcp --dport 443 -d us-central1-aiplatform.googleapis.com -j ACCEPT +iptables -A OUTPUT -p tcp --dport 443 -d us-east5-aiplatform.googleapis.com -j ACCEPT +iptables -A OUTPUT -p tcp --dport 443 -d europe-west1-aiplatform.googleapis.com -j ACCEPT +iptables -A OUTPUT -d metadata.google.internal -j ACCEPT + +# Allow Claude API (direct Anthropic, if used alongside or instead of Vertex) +iptables -A OUTPUT -d api.anthropic.com -j ACCEPT +iptables -A OUTPUT -d statsig.anthropic.com -j ACCEPT +iptables -A OUTPUT -d sentry.io -j ACCEPT + +# Allow GitHub (for git push, gh CLI, API) +iptables -A OUTPUT -d github.com -j ACCEPT +iptables -A OUTPUT -d api.github.com -j ACCEPT + +# Allow container registries +iptables -A OUTPUT -d quay.io -j ACCEPT +iptables -A OUTPUT -d cdn.quay.io -j ACCEPT +iptables -A OUTPUT -d cdn01.quay.io -j ACCEPT +iptables -A OUTPUT -d cdn02.quay.io -j ACCEPT +iptables -A OUTPUT -d cdn03.quay.io -j ACCEPT +iptables -A OUTPUT -d registry.access.redhat.com -j ACCEPT + +# Allow SSH (for GCP VM access during integration testing) +iptables -A OUTPUT -p tcp --dport 22 -j ACCEPT + +# Allow npm registry +iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT + +# Allow Go module proxy +iptables -A OUTPUT -d proxy.golang.org -j ACCEPT +iptables -A OUTPUT -d sum.golang.org -j ACCEPT + +# Drop everything else +iptables -A OUTPUT -j DROP + +echo "Firewall configured." diff --git a/.devcontainer/run.sh b/.devcontainer/run.sh new file mode 100755 index 0000000000..ae716e967f --- /dev/null +++ b/.devcontainer/run.sh @@ -0,0 +1,204 @@ +#!/usr/bin/env bash +# Launch Claude Code in the collector devcontainer with a task. +# +# Usage: +# .devcontainer/run.sh "task description" Full: worktree + PR + CI loop +# .devcontainer/run.sh --headless "task description" Worktree + stream output, no PR +# .devcontainer/run.sh --interactive Worktree + TUI, no PR +# .devcontainer/run.sh --local ["task"] Edit working tree directly +# .devcontainer/run.sh --shell Shell into container +# +# Prerequisites: +# - Docker +# - gh (GitHub CLI, authenticated — only needed for default task mode) +# - gcloud auth login && gcloud auth application-default login +# - CLAUDE_CODE_USE_VERTEX=1 and related env vars (see CLAUDE.md) + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +IMAGE="${COLLECTOR_DEV_IMAGE:-collector-dev:test}" +PLUGIN_DIR="/workspace/.claude/plugins/collector-dev" + +CLAUDE_INTERACTIVE=(claude --dangerously-skip-permissions --plugin-dir "$PLUGIN_DIR") +CLAUDE_AUTONOMOUS=(claude --dangerously-skip-permissions --plugin-dir "$PLUGIN_DIR" --output-format stream-json --verbose) + +# --- Worktree isolation --- +setup_worktree() { + local task_id + task_id="agent-$(date +%s)-$$" + local branch="claude/${task_id}" + local worktree_dir="/tmp/collector-${task_id}" + + git -C "$REPO_ROOT" worktree add -b "$branch" "$worktree_dir" HEAD >/dev/null 2>&1 + + # Only init submodules needed for building collector (not builder/third_party) + echo "Initializing submodules..." >&2 + git -C "$worktree_dir" submodule update --init \ + falcosecurity-libs \ + collector/proto/third_party/stackrox \ + >/dev/null 2>&1 + + echo "$worktree_dir" +} + +cleanup_worktree() { + local worktree_dir="$1" + if [[ -d "$worktree_dir" ]]; then + local branch + branch=$(git -C "$worktree_dir" branch --show-current 2>/dev/null || true) + git -C "$REPO_ROOT" worktree remove --force "$worktree_dir" 2>/dev/null || true + if [[ -n "$branch" ]]; then + if ! git -C "$REPO_ROOT" config "branch.${branch}.remote" &>/dev/null; then + git -C "$REPO_ROOT" branch -D "$branch" 2>/dev/null || true + fi + fi + fi +} + +# --- Create branch + draft PR --- +setup_pr() { + local worktree_dir="$1" + local task="$2" + local branch + branch=$(git -C "$worktree_dir" branch --show-current) + + git -C "$worktree_dir" push -u origin "$branch" >/dev/null 2>&1 + + local pr_url + pr_url=$(gh pr create \ + --repo stackrox/collector \ + --head "$branch" \ + --draft \ + --title "claude: ${task:0:70}" \ + --body "$(cat <&1) || true + + echo "$pr_url" +} + +# --- Docker args --- +build_docker_args() { + local workspace="$1" + DOCKER_ARGS=( + --rm + -v "$workspace:/workspace" + -v "$HOME/.config/gcloud:/home/dev/.config/gcloud:ro" + -v "$HOME/.gitconfig:/home/dev/.gitconfig:ro" + -v "$HOME/.ssh:/home/dev/.ssh:ro" + -e CLOUDSDK_CONFIG=/home/dev/.config/gcloud + -e GOOGLE_APPLICATION_CREDENTIALS=/home/dev/.config/gcloud/application_default_credentials.json + -w /workspace + ) + + for var in CLAUDE_CODE_USE_VERTEX GOOGLE_CLOUD_PROJECT GOOGLE_CLOUD_LOCATION ANTHROPIC_VERTEX_PROJECT_ID; do + if [[ -n "${!var:-}" ]]; then + DOCKER_ARGS+=(-e "$var=${!var}") + fi + done +} + +# --- Main --- +case "${1:-}" in + --interactive|-i) + WORKTREE=$(setup_worktree) + trap "cleanup_worktree '$WORKTREE'" EXIT + BRANCH=$(git -C "$WORKTREE" branch --show-current) + echo "Working in isolated worktree: $WORKTREE" + echo "Branch: $BRANCH" + build_docker_args "$WORKTREE" + docker run -it "${DOCKER_ARGS[@]}" "$IMAGE" \ + "${CLAUDE_INTERACTIVE[@]}" + ;; + + --headless|-H) + shift + if [[ -z "${1:-}" ]]; then + echo "Usage: $0 --headless \"task description\"" >&2 + exit 1 + fi + WORKTREE=$(setup_worktree) + trap "cleanup_worktree '$WORKTREE'" EXIT + BRANCH=$(git -C "$WORKTREE" branch --show-current) + echo "Working in isolated worktree: $WORKTREE" >&2 + echo "Branch: $BRANCH" >&2 + build_docker_args "$WORKTREE" + docker run "${DOCKER_ARGS[@]}" "$IMAGE" \ + "${CLAUDE_AUTONOMOUS[@]}" -p "$*" + ;; + + --local|-l) + shift + build_docker_args "$REPO_ROOT" + if [[ -z "${1:-}" ]]; then + docker run -it "${DOCKER_ARGS[@]}" "$IMAGE" \ + "${CLAUDE_INTERACTIVE[@]}" + else + docker run -it "${DOCKER_ARGS[@]}" "$IMAGE" \ + "${CLAUDE_INTERACTIVE[@]}" -p "$*" + fi + ;; + + --shell|-s) + WORKTREE=$(setup_worktree) + trap "cleanup_worktree '$WORKTREE'" EXIT + echo "Working in isolated worktree: $WORKTREE" + build_docker_args "$WORKTREE" + docker run -it "${DOCKER_ARGS[@]}" "$IMAGE" zsh + ;; + + ""|--help|-h) + cat <&2 + echo "Branch: $BRANCH" >&2 + echo "Task: $TASK" >&2 + echo "---" >&2 + echo "Creating draft PR..." >&2 + PR_URL=$(setup_pr "$WORKTREE" "$TASK") + echo "PR: $PR_URL" >&2 + echo "---" >&2 + + trap "cleanup_worktree '$WORKTREE'" EXIT + + build_docker_args "$WORKTREE" + docker run "${DOCKER_ARGS[@]}" "$IMAGE" \ + "${CLAUDE_AUTONOMOUS[@]}" -p \ + "/collector-dev:task You are working on branch '$BRANCH'. A draft PR has been created at: $PR_URL + +Your task: $TASK + +The branch is already pushed. Do not create new branches or PRs. Commit and push with git." + ;; +esac diff --git a/.gitignore b/.gitignore index 25842a918b..79b230ae66 100644 --- a/.gitignore +++ b/.gitignore @@ -22,8 +22,6 @@ cmake-build-*/ # vscode configuration files .vscode/ -.devcontainer/ -.devcontainer.json cmake-build/ out/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000000..d017ef3f7f --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,248 @@ +# Collector - Agent Development Guide + +Collector is a C++ eBPF-based runtime security agent that captures process, +network, and container events from Linux kernels. It uses CO-RE BPF +(Compile Once, Run Everywhere) via falcosecurity-libs and reports events +to StackRox Sensor via gRPC. + +## Devcontainer Setup + +### Prerequisites + +1. Docker (Docker Desktop, OrbStack, or Colima) +2. GCP access with Vertex AI enabled (for Claude Code) + +### Vertex AI Configuration + +Claude Code in the devcontainer authenticates to Vertex AI using your host's +gcloud credentials (mounted read-only). Set up once on your host: + +```bash +# Authenticate to GCP +gcloud auth login +gcloud auth application-default login + +# Set these environment variables (add to your shell profile) +export CLAUDE_CODE_USE_VERTEX=1 +export GOOGLE_CLOUD_PROJECT= +export GOOGLE_CLOUD_LOCATION= # e.g., us-east5 +export ANTHROPIC_VERTEX_PROJECT_ID= +``` + +The devcontainer.json forwards these env vars into the container +automatically via `${localEnv:*}`. + +### Launch + +**VSCode:** Open the repo, click "Reopen in Container" when prompted. + +**CLI:** +```bash +# Build and start the devcontainer +devcontainer up --workspace-folder . + +# Or run Claude Code directly +docker run --rm \ + -v "$(pwd):/workspace" \ + -v "$HOME/.config/gcloud:/home/dev/.config/gcloud:ro" \ + -v "$HOME/.gitconfig:/home/dev/.gitconfig:ro" \ + -e CLAUDE_CODE_USE_VERTEX=1 \ + -e GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT \ + -e GOOGLE_CLOUD_LOCATION=$GOOGLE_CLOUD_LOCATION \ + -e ANTHROPIC_VERTEX_PROJECT_ID=$ANTHROPIC_VERTEX_PROJECT_ID \ + -e CLOUDSDK_CONFIG=/home/dev/.config/gcloud \ + -e GOOGLE_APPLICATION_CREDENTIALS=/home/dev/.config/gcloud/application_default_credentials.json \ + -w /workspace \ + collector-dev:latest \ + claude --dangerously-skip-permissions +``` + +### Building Inside the Devcontainer + +The devcontainer IS the builder image — no nested Docker needed: + +```bash +# Configure (first time or after CMakeLists.txt changes) +cmake -S . -B cmake-build \ + -DCMAKE_BUILD_TYPE=Release \ + -DCOLLECTOR_VERSION=$(git describe --tags --abbrev=10 --long) + +# Build (~30s incremental) +cmake --build cmake-build -- -j$(nproc) + +# Unit tests (17 tests, ~13s) +ctest --no-tests=error -V --test-dir cmake-build +``` + +### Building on the Host (without devcontainer) + +## Quick Reference + +```bash +# Build (uses builder container with all C++ deps) +make start-builder # Start builder container (first time / after reboot) +make collector # Compile collector binary (~30s incremental) +make image # Build container image +make image-dev # Build dev image (with package manager, gdb) + +# Test +make unittest # C++ unit tests via ctest (~1 min) +CMAKE_BUILD_TYPE=Debug make unittest # With debug symbols + +# Lint +make check-clang-format-all # Check C++ formatting +make check-flake8-all # Check Python + +# Integration tests (local, requires Docker + privileged) +cd integration-tests +make TestProcessNetwork # Single test suite +make ci-integration-tests # Full suite (2h timeout) +``` + +## Architecture + +``` +collector/ # Main C++ application +├── lib/ # Core library (~108 files) +│ ├── KernelDriver.h # eBPF probe lifecycle (Setup/Start/Stop) +│ ├── CollectorService.cpp # Main service loop +│ ├── ConnTracker.cpp # Connection state machine +│ ├── NetworkConnection.h # IP/port/protocol structures +│ └── ProcessSignalHandler.h # Process event formatting +├── test/ # Unit tests (GTest/GMock) +├── container/Dockerfile # Production container (UBI minimal) +└── collector.cpp # Main entrypoint + +falcosecurity-libs/ # Submodule: eBPF engine +└── driver/modern_bpf/ # CO-RE BPF programs + ├── programs/attached/ # Tracepoint handlers (sys_enter, sys_exit, sched_*) + └── maps/ # BPF maps (ring buffers, tail call tables) + +builder/ # Builder image (CentOS Stream 10) +├── Dockerfile +└── install/ # Dependency build scripts (grpc, protobuf, libbpf, etc.) + +integration-tests/ # Go test framework (testify/suite) +├── suites/ # 26 test suites +├── pkg/mock_sensor/ # Mock gRPC sensor +└── pkg/executor/ # Container runtime abstraction + +ansible/ # VM lifecycle and test orchestration +├── integration-tests.yml # Create VM → provision → test → destroy +├── dev/ # Developer inventory (acs-team-sandbox) +└── roles/ # create-vm, provision-vm, run-test-target +``` + +## Development Workflow + +### For C++ / library changes (non-eBPF) + +Changes to `collector/lib/` that don't touch kernel interaction: + +1. Edit source files +2. `make collector` — compile (~30s incremental) +3. `make unittest` — run unit tests +4. Push PR — CI validates across platforms + +Unit tests cover: config parsing, connection tracking, network structures, +process filtering, event formatting, host info detection. + +### For eBPF / kernel driver changes + +Changes to `falcosecurity-libs/driver/modern_bpf/` or `collector/lib/KernelDriver.h`: + +1. Edit source files +2. `make collector` — compile (eBPF compiles to skeleton header) +3. `make unittest` — validates C++ logic only +4. **Push PR** — CI runs integration tests on real kernels +5. Monitor CI: `.github/workflows/integration-tests.yml` runs on + rhel, ubuntu, cos, flatcar, fedora-coreos across amd64/arm64/s390x/ppc64le + +**Unit tests CANNOT validate eBPF changes.** The BPF programs must load into +a real kernel with BTF support. CI handles this across 10+ VM types. + +### For integration test changes + +Changes to `integration-tests/`: + +1. Edit Go test files +2. Build test binary: `cd integration-tests && make build` +3. Run locally if Docker available: `make TestProcessNetwork` +4. Push PR — CI runs full matrix + +### Build Variables + +| Variable | Default | Purpose | +|----------|---------|---------| +| CMAKE_BUILD_TYPE | Release | Release or Debug | +| ADDRESS_SANITIZER | OFF | Enable AddressSanitizer | +| THREAD_SANITIZER | OFF | Enable ThreadSanitizer | +| USE_VALGRIND | OFF | Valgrind profiling | +| BPF_DEBUG_MODE | OFF | BPF debug output | +| COLLECTOR_BUILDER_TAG | master | Builder image version | + +### Running collector locally + +```yaml +# docker-compose.dev.yml pattern: +services: + collector: + image: quay.io/stackrox-io/collector:${COLLECTOR_TAG} + privileged: true + network_mode: host + environment: + - GRPC_SERVER=localhost:9999 + - COLLECTION_METHOD=core-bpf + - COLLECTOR_HOST_ROOT=/host + volumes: + - /var/run/docker.sock:/host/var/run/docker.sock:ro + - /proc:/host/proc:ro + - /etc:/host/etc:ro + - /sys/:/host/sys/:ro +``` + +Standalone mode (no gRPC server, outputs JSON to stdout): +```bash +collector --grpc-server= +``` + +### Hotreload on local K8s + +For rapid iteration without rebuilding the container image: +```bash +# Deploy stackrox to a local cluster first, then: +./utilities/hotreload.sh +# Recompile with: make -C collector container/bin/collector +``` + +## Key Dependencies + +- gRPC v1.67.0, Protobuf v28.3 +- libbpf v1.3.4, CO-RE BPF (kernel >= 5.8 with BTF) +- falcosecurity-libs (submodule, scap + sinsp) +- Google Test v1.15.2 + +## CI Pipeline + +Push to PR triggers `.github/workflows/main.yml`: +1. **init** — set tags, determine what to build +2. **build-collector** — multi-arch compile +3. **unit-tests** — ctest (Release, ASAN, Valgrind) +4. **integration-tests** — VM matrix (rhel, ubuntu, cos, flatcar, etc.) +5. **k8s-integration-tests** — KinD cluster tests +6. **benchmarks** — performance (master only or `run-benchmark` label) + +### Triggering specific CI behavior + +- Add `build-builder-image` label to rebuild the builder +- Add `run-benchmark` label for performance tests +- Add `update-baseline` label to update benchmark baseline + +## File Conventions + +- C++17, compiled with clang +- Format: `clang-format` (check with `make check-clang-format-all`) +- Integration tests: Go with testify/suite +- Shell scripts: `shfmt` + `shellcheck` +- Python: `flake8` +- Git hooks: `pre-commit` (run `make init-githook`) diff --git a/README.md b/README.md index 558300a73c..7a340cf8e4 100644 --- a/README.md +++ b/README.md @@ -26,9 +26,8 @@ Here are few links to get more details: project, this is the best place to start. This section covers building and troubleshooting the project from scratch. -2. [Design overview](docs/design-overview.md): When your goal is to better - understand how Collector works, and it's place in the grand scheme of - things, you may want to look here. +2. [Architecture](docs/architecture.md): Comprehensive overview of how + Collector works — components, data flow, threading, configuration. 3. [Troubleshooting](docs/troubleshooting.md): For common startup errors, ways of identifying and fixing them. @@ -38,3 +37,30 @@ Here are few links to get more details: 5. [References](docs/references.md): Contains a comprehensive list of configuration options for the project. + +## Deep Dives + +6. [eBPF Architecture](docs/ebpf-architecture.md): CO-RE BPF kernel + instrumentation — tracepoints, tail calls, ring buffers, verifier. + +7. [Build System](docs/build.md): CMake/Docker build pipeline, dependencies, + multi-arch support. + +8. [Integration Tests](docs/integration-tests.md): Test framework, 26 suites, + mock sensor, CI integration. + +9. [Deployment](docs/deployment.md): Ansible automation, VM lifecycle, K8s + DaemonSet deployment. + +10. [Falcosecurity-libs](docs/falcosecurity-libs.md): BPF driver architecture + and StackRox fork customizations. + +11. [Falco Fork Update](docs/falco-update.md): How to rebase the + falcosecurity-libs fork on upstream. + +12. [C++ Library Internals](docs/lib/README.md): Code-level documentation of + collector/lib/ components. + +13. [Driver Builds](docs/driver-builds.md): CPaaS/OSCI driver build pipeline. + +14. [CI Labels](docs/labels.md): GitHub Actions labels for CI control. diff --git a/doc_map.md b/doc_map.md new file mode 100644 index 0000000000..2bdb51bafc --- /dev/null +++ b/doc_map.md @@ -0,0 +1,168 @@ +# Documentation Map: `doc/` vs `docs/` + +> **Status: MERGE COMPLETE.** All content from `doc/` has been copied into `docs/` with inaccuracies fixed. The `doc/` directory can be removed in a follow-up PR. The top-level `README.md` has been updated with links to all new documentation. + +## Overview (pre-merge analysis) + +The repository had **two** documentation directories with different origins and purposes: + +| Directory | Files | Origin | Style | +|-----------|-------|--------|-------| +| `docs/` | 8 files + 1 image | Original, human-written | Concise, operational guides | +| `doc/` | 16 files (6 top-level + 9 in `lib/` + 1 verification report) | Newer, AI-assisted deep dives | Comprehensive architecture docs | + +The top-level `README.md` only links to `docs/` files. The `doc/` directory is currently **unlinked** from any navigation. + +--- + +## File Inventory + +### `docs/` (existing, linked from README.md) + +| File | Purpose | Length | Status | +|------|---------|--------|--------| +| `how-to-start.md` | Dev setup, building, debugging, hotreload | ~281 lines | Mostly current; some Docker Desktop bug refs may be stale | +| `design-overview.md` | High-level architecture stub | ~31 lines | **Incomplete** — several empty sections | +| `troubleshooting.md` | Log retrieval, startup errors, metrics, profiling, introspection | ~597 lines | Current; some profiler paths may be stale | +| `references.md` | Env vars, CLI args, JSON config, runtime config | ~191 lines | Current | +| `release.md` | Automated + manual release process | ~138 lines | tag-bumper.py noted as out of date | +| `falco-update.md` | How to rebase the falcosecurity-libs fork | ~72 lines | Current | +| `driver-builds.md` | CPaaS/OSCI driver pipeline diagrams | ~57 lines | Current (diagram-heavy) | +| `labels.md` | GHA labels for CI control | ~8 lines | Current | + +### `doc/` (new, NOT linked from README.md) + +| File | Purpose | Length | Status | +|------|---------|--------|--------| +| `README.md` | Full architecture overview: components, data flow, threading, config, testing, deployment, performance | ~800 lines | Has known inaccuracies (see below) | +| `lib.md` | Deep dive into collector/lib C++ codebase (~108 files) | ~70KB | Has hardcoded paths, stale line counts | +| `falcosecurity-libs.md` | BPF driver integration: layers, syscalls, ring buffers, fork patches | ~150 lines | Version 3.21.0-2 should be 3.21.6 | +| `integration-tests.md` | Test framework: 26 suites (doc says 27), mock sensor, CI | ~310 lines | Test count off by 1 | +| `build.md` | Multi-stage CMake/Docker build: deps, targets, workflows | ~420 lines | gRPC/Civetweb/Protobuf versions wrong | +| `deployment.md` | Ansible automation, VM lifecycle, K8s DaemonSet, CI | ~610 lines | Current | +| `ebpf-architecture.md` | CO-RE BPF deep dive: tracepoints, tail calls, ring buffers, verifier | ~1200 lines | Current | +| `VERIFICATION_REPORT.md` | QA report on doc accuracy vs source code | ~440 lines | Meta-document, dated 2026-03-13 | +| `lib/README.md` | Index of C++ library components | ~180 lines | Current | +| `lib/grpc.md` | gRPC bidirectional streaming layer | ~150 lines | Current | +| `lib/config.md` | Config system: args, env vars, YAML, hot reload | ~190 lines | Current | +| `lib/core.md` | Main loop, lifecycle, metrics, health endpoints | ~150 lines | Current | +| `lib/containers.md` | Container ID extraction, metadata caching | ~140 lines | Current | +| `lib/system.md` | SystemInspector ↔ BPF abstraction boundary | ~230 lines | Current | +| `lib/networking.md` | Network flow tracking, ConnTracker, Afterglow | ~130 lines | Current | +| `lib/process.md` | Process exec events, lineage tracking | ~70 lines | Current | + +--- + +## Topic Overlap Analysis + +| Topic | `docs/` file | `doc/` file | Overlap Level | Notes | +|-------|-------------|------------|---------------|-------| +| Architecture overview | `design-overview.md` (stub) | `README.md` (comprehensive) | **Replace** | `doc/README.md` is a full replacement for the incomplete `docs/design-overview.md` | +| Falcosecurity-libs | `falco-update.md` (how to rebase) | `falcosecurity-libs.md` (architecture) | **Complementary** | Different angles: one is process, the other is architecture | +| Build system | `how-to-start.md` (dev setup) | `build.md` (build internals) | **Complementary** | `how-to-start.md` = getting started; `build.md` = deep dive | +| Testing | — | `integration-tests.md` | **New coverage** | No equivalent in `docs/` | +| Deployment/Ansible | — | `deployment.md` | **New coverage** | No equivalent in `docs/` | +| eBPF internals | — | `ebpf-architecture.md` | **New coverage** | No equivalent in `docs/` | +| C++ library internals | — | `lib/*.md` (8 files) | **New coverage** | No equivalent in `docs/` | +| Troubleshooting | `troubleshooting.md` | (partial in `README.md`) | **Keep docs/** | `docs/troubleshooting.md` is the authoritative source | +| Config reference | `references.md` | `lib/config.md` | **Complementary** | `references.md` = user-facing ref; `lib/config.md` = implementation | +| Release process | `release.md` | — | **Keep docs/** | Only in `docs/` | +| CI labels | `labels.md` | — | **Keep docs/** | Only in `docs/` | +| Driver builds | `driver-builds.md` | — | **Keep docs/** | Only in `docs/` | + +--- + +## Known Inaccuracies to Fix + +These were identified in `doc/VERIFICATION_REPORT.md` (2026-03-13): + +| File | Issue | Doc Says | Actual | +|------|-------|----------|--------| +| `doc/README.md`, `doc/integration-tests.md` | Test suite count | 27 | 26 | +| `doc/README.md`, `doc/lib.md` | C++ line count | ~15,778 | ~16,521 | +| `doc/lib.md` | Class name | ConnectionTracker | ConnTracker | +| `doc/build.md` | gRPC version | v1.68.3 | v1.67.0 | +| `doc/build.md` | Civetweb version | v1.17 | v1.16 | +| `doc/build.md` | Protobuf version | v29.3 | v28.3 | +| `doc/falcosecurity-libs.md` | Libs version | 3.21.0-2 | 3.21.6 | +| `doc/lib.md` | Hardcoded path | `/Users/rc/go/src/...` | Should be relative | + +--- + +## Recommended Merge Strategy + +**Goal:** Unify under `docs/` without changing existing files, by adding new files and updating only the top-level `README.md` links. + +### Step 1: Move `doc/` content into `docs/` + +Add the following new files to `docs/` (no existing files modified): + +| New `docs/` file | Source | Action | +|-------------------|--------|--------| +| `docs/architecture.md` | `doc/README.md` | Copy, fix inaccuracies, replace `docs/design-overview.md` link in README | +| `docs/integration-tests.md` | `doc/integration-tests.md` | Copy, fix test count | +| `docs/build.md` | `doc/build.md` | Copy, fix dependency versions | +| `docs/deployment.md` | `doc/deployment.md` | Copy as-is | +| `docs/ebpf-architecture.md` | `doc/ebpf-architecture.md` | Copy as-is | +| `docs/falcosecurity-libs.md` | `doc/falcosecurity-libs.md` | Copy, fix version, keep alongside `falco-update.md` | +| `docs/lib/` (entire dir) | `doc/lib/` | Copy directory, fix ConnTracker class name | + +### Step 2: Update top-level `README.md` + +Add links for the new documentation (minimal change): + +```markdown +## Useful links + +1. [How to start](docs/how-to-start.md): Building and contributing. +2. [Architecture](docs/architecture.md): How Collector works internally. +3. [Troubleshooting](docs/troubleshooting.md): Common errors and diagnostics. +4. [Release Process](docs/release.md): Release procedures. +5. [References](docs/references.md): Configuration options. + +## Deep Dives + +6. [eBPF Architecture](docs/ebpf-architecture.md): CO-RE BPF kernel instrumentation. +7. [Build System](docs/build.md): CMake/Docker build pipeline. +8. [Integration Tests](docs/integration-tests.md): Test framework and suites. +9. [Deployment](docs/deployment.md): Ansible automation and K8s deployment. +10. [Falcosecurity-libs](docs/falcosecurity-libs.md): BPF driver architecture. +11. [Falco Fork Update](docs/falco-update.md): How to rebase the fork. +12. [C++ Library Internals](docs/lib/README.md): Code-level documentation. +13. [Driver Builds](docs/driver-builds.md): CPaaS/OSCI pipeline. +14. [CI Labels](docs/labels.md): GitHub Actions labels. +``` + +### Step 3: Retire `doc/` + +After merging, `doc/` can be: +- Kept as-is for reference (no links point to it) +- Or deleted in a follow-up PR + +### Step 4: Deprecate `docs/design-overview.md` + +Add a note at the top redirecting to `docs/architecture.md`: +```markdown +> **Note:** This document has been superseded by [Architecture](architecture.md). +``` + +### What does NOT need to change + +- `docs/how-to-start.md` — keep as-is +- `docs/troubleshooting.md` — keep as-is +- `docs/references.md` — keep as-is +- `docs/release.md` — keep as-is +- `docs/labels.md` — keep as-is +- `docs/driver-builds.md` — keep as-is +- `docs/falco-update.md` — keep as-is + +### Files to skip during merge + +- `doc/lib.md` — superseded by `doc/lib/README.md` + individual `doc/lib/*.md` files; too large and has hardcoded paths +- `doc/VERIFICATION_REPORT.md` — meta-document, not user-facing; use it to fix inaccuracies then discard + +--- + +## Summary + +The `doc/` directory contains high-quality architecture documentation that **fills major gaps** in `docs/` (no testing, build, deployment, eBPF, or library internals docs existed before). The only real overlap is `design-overview.md` which is incomplete and should be superseded. The merge is low-risk: add new files to `docs/`, fix known inaccuracies, update the README links. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000000..540e99f726 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,777 @@ +# StackRox Collector + +Runtime data collection agent for StackRox (Red Hat Advanced Cluster Security) platform. Runs on every Kubernetes node, gathering security data from Linux kernel via CO-RE BPF probes. + +**Language:** C++ (core) + Go (integration tests) +**Driver:** CO-RE BPF only (kernel >= 5.8 with BTF) +**Communication:** gRPC bidirectional streaming to Sensor +**Architecture:** x86_64, aarch64, ppc64le, s390x + +## Documentation Index + +- **[architecture.md](architecture.md)** (this file) - Architecture, data collection, communication +- **[lib/README.md](lib/README.md)** - C++ library components index +- **[falcosecurity-libs.md](falcosecurity-libs.md)** - CO-RE BPF driver integration +- **[integration-tests.md](integration-tests.md)** - Test framework (26 suites) +- **[build.md](build.md)** - Build system (CMake, Docker, vcpkg) +- **[deployment.md](deployment.md)** - Ansible playbooks, K8s deployment +- **[ebpf-architecture.md](ebpf-architecture.md)** - CO-RE BPF deep dive + +## Architecture + +### Components + +**Collector Process** +Main binary running in privileged DaemonSet pod. Initializes kernel driver, starts event processing threads, manages gRPC connection to Sensor. + +**CO-RE BPF Probes** (falcosecurity-libs) +Kernel instrumentation via modern BPF. Attached to tracepoints (sys_enter, sys_exit, sched_process_exit). Writes events to per-CPU ring buffers. Compile-once-run-everywhere (no kernel headers at runtime). + +**SystemInspectorService** (collector/lib/system-inspector/Service.cpp) +Event loop consuming libsinsp inspector. Polls ring buffers, enriches events with container metadata, dispatches to NetworkSignalHandler and ProcessSignalHandler. + +**NetworkSignalHandler** (collector/lib/NetworkSignalHandler.cpp) +Processes network syscalls (connect, accept, send, recv, close). Extracts 5-tuples, feeds ConnTracker. Implements afterglow-based deduplication. + +**ProcessSignalHandler** (collector/lib/ProcessSignalFormatter.cpp) +Processes lifecycle events (execve, fork, clone, exit). Builds process lineage, formats signals, sends to Sensor. + +**ConnTracker** (collector/lib/ConnTracker.cpp) +Afterglow algorithm for network flow aggregation. Maintains active/inactive connection maps, deduplicates by 5-tuple, periodic scrubbing. Sends batches to gRPC. + +**ConnScraper** (collector/lib/ConnScraper.cpp) +Scans /proc/net/{tcp,udp} for pre-existing connections. Discovers listening endpoints, enriches with process info via /proc/[pid]/fd/. + +**CollectorService** (collector/lib/CollectorService.cpp) +gRPC service implementation. Bidirectional streaming with Sensor, handles PushSignals (outbound) and control commands (inbound). + +### Data Flow + +``` +┌─────────────────────────────────────────────────────────┐ +│ Kernel (CO-RE BPF) │ +│ Tracepoints: sys_enter, sys_exit, sched_process_exit │ +└─────────────────┬───────────────────────────────────────┘ + │ Per-CPU ring buffers + ↓ +┌─────────────────────────────────────────────────────────┐ +│ libscap (falcosecurity-libs) │ +│ Ring buffer polling, event parsing, /proc state │ +└─────────────────┬───────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────────┐ +│ libsinsp (falcosecurity-libs) │ +│ Event enrichment, container metadata, filtering │ +└─────────────────┬───────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────────┐ +│ SystemInspectorService (collector) │ +│ Event loop: inspector->next() → dispatch │ +└────┬────────────────────────────────────┬───────────────┘ + │ │ + ↓ ↓ +┌────────────────────┐ ┌─────────────────────────┐ +│NetworkSignalHandler│ │ ProcessSignalHandler │ +│ - Parse syscalls │ │ - Build lineage │ +│ - Extract tuples │ │ - Format signals │ +│ - Feed tracker │ │ - Send gRPC │ +└────┬───────────────┘ └─────────────────────────┘ + │ + ↓ +┌────────────────────┐ ┌─────────────────────────┐ +│ ConnTracker │ │ ConnScraper │ +│ - Afterglow │←───────┤ - /proc/net scan │ +│ - Deduplication │ │ - Endpoints │ +│ - Batch sends │ └─────────────────────────┘ +└────┬───────────────┘ + │ + ↓ +┌────────────────────────────────────────────────────────┐ +│ CollectorService (gRPC) │ +│ Bidirectional streaming to Sensor │ +└────────────────────────────────────────────────────────┘ +``` + +### Threading + +**Main Thread** +Initialization, gRPC server, signal handling. + +**SystemInspector Thread** +Runs event loop (`inspector->next()`), dispatches to handlers. + +**NetworkSignalHandler Thread** +Consumes network events, processes connections. + +**ProcessSignalHandler Thread** +Formats process signals, sends gRPC. + +**ConnScraper Thread** +Periodic /proc scanning (default: 15s interval). + +**Afterglow Scrubber** +Periodic cleanup of inactive connections. + +Synchronization via mutexes in ConnTracker, ProcessStore. Lock-free queues for gRPC sends. + +## Data Collection + +### Network Monitoring + +**Syscalls Tracked** +- Connection lifecycle: connect, accept, accept4, bind, listen +- Data transfer: send, sendto, sendmsg, sendmmsg, recv, recvfrom, recvmsg, recvmmsg +- Socket operations: socket, socketpair, shutdown, close +- Status: getsockopt (async connection status) + +**Connection Information** +- 5-tuple: src IP:port, dst IP:port, protocol (TCP/UDP) +- Role: client (initiated) vs server (accepted) +- Timestamps: first seen, last seen, close time +- Byte counts: sent, received +- Container context: ID, pod, namespace +- Process context: PID, name, path, cmdline, lineage + +**Afterglow Algorithm** +Deduplicates repeated connections within configurable period (default: 30s). When connection closed, moved to inactive map with expiration timestamp. Re-opened connections extend expiration, avoiding duplicate reports. Scrubber periodically removes expired entries. + +**Endpoints** +Listening sockets tracked separately. ConnScraper discovers pre-existing listeners via /proc/net/{tcp,udp}. Enriched with originator process (name, path, args, user) via /proc/[pid]/fd/. Reported as separate signal type. + +**Public IP Detection** +HostHeuristics determines if endpoint is cluster-internal or external. Uses configurable public IP ranges, cluster CIDR detection. + +### Process Monitoring + +**Lifecycle Events** +- Execution: execve, execveat +- Creation: fork, vfork, clone, clone3 +- Termination: exit (via sched_process_exit tracepoint) + +**Process Information** +- PID, PPID, UID, GID +- Executable path, name +- Command line arguments +- Working directory +- Capabilities +- Cgroup membership +- Container ID, pod, namespace +- Lineage (parent chain) + +**Lineage Tracking** +ProcessStore maintains parent-child relationships. Process signals include ancestor chain (process → parent → grandparent → ...). Enables attack path reconstruction. + +**Container Metadata** +libsinsp resolves container IDs via container runtime APIs (Docker, containerd, CRI-O, Podman). Extracts pod name, namespace, labels, annotations, image. + +### Procfs Scraping + +ConnScraper periodically reads /proc/net/{tcp,tcp6,udp,udp6,raw} to discover connections established before collector started. Cross-references /proc/[pid]/fd/ to identify process owners. Populates endpoints with originator info. + +Scraper also monitors listening sockets. Detects services opening ports, reports as endpoint signals. Handles ephemeral listeners (e.g., load balancers). + +### Event Filtering + +libsinsp filter engine applies declarative filters: +``` +evt.type=connect and container.id!=host and fd.type in (ipv4, ipv6) +``` + +Collector configures socket-only FD tracking (`SCAP_SOCKET_ONLY_FD`), reducing memory/CPU. Interesting cgroup subsystems limited to perf_event, cpu, cpuset, memory. + +Slim threadinfo mode (`SINSP_SLIM_THREADINFO`) omits env vars, reduces memory ~60%. + +## Communication Protocol + +### gRPC Service + +Collector implements bidirectional streaming gRPC service defined in collector.proto (StackRox central repo): + +**PushSignals (outbound)** +Collector → Sensor. Sends batches of signals: +- NetworkConnectionInfo: Connection events +- NetworkEndpointInfo: Listening endpoints +- ProcessSignal: Process lifecycle events + +**Control Commands (inbound)** +Sensor → Collector: +- Configuration updates +- Connection scraping requests +- Health checks + +**Connection Management** +Automatic reconnection with exponential backoff. TLS mutual authentication. Metadata includes collector version, hostname, container runtime. + +### Signal Batching + +Signals batched for efficiency. Network connections accumulated in ConnTracker, flushed periodically or when batch size threshold reached. Process signals sent individually or in small batches. + +Batching controlled by: +- afterglow period (default: 30s) +- connection stats aggregation window (default: 15s) +- batch size limits + +### Message Format + +Signals serialized as protobuf. NetworkConnectionInfo includes: +- Connection 5-tuple +- Role (client/server) +- Timestamps +- Byte counts +- Container info +- Process info (if available) + +ProcessSignal includes: +- Event type (exec, fork, exit) +- Process info +- Lineage chain +- Container info +- Exec args/env (configurable) + +## Configuration + +### Environment Variables + +**Collection** +- COLLECTION_METHOD: "EBPF" (CO-RE BPF) +- GRPC_SERVER: Sensor address (e.g., sensor.stackrox.svc:443) +- COLLECTOR_CONFIG: JSON/YAML config (overrides runtime config) + +**Host Access** +- COLLECTOR_HOST_ROOT: Host filesystem mount point (/host) + +**Logging** +- COLLECTOR_LOG_LEVEL: trace, debug, info, warning, error +- COLLECTOR_INTROSPECTION_ENABLE: Enable HTTP introspection endpoints + +**Performance** +- SINSP_CPU_PER_BUFFER: CPUs per ring buffer (0 = 1:1) +- SINSP_TOTAL_BUFFER_SIZE: Total ring buffer size (512 MB) +- SINSP_THREAD_CACHE_SIZE: Thread info cache (32768) + +### Runtime Configuration + +YAML file at /etc/stackrox/runtime_config.yaml (or path in COLLECTOR_CONFIG). Inotify-based hot reload without restart. + +**Networking**: +```yaml +networking: + externalIps: + enable: true + connectionStats: + aggregationWindow: 15s + afterglow: + period: 30s +``` + +**Process Listening**: +```yaml +processesListening: + enable: true +``` + +**Scraping**: +```yaml +scrape: + interval: 15s + enableConnectionStats: true +``` + +**TLS**: +```yaml +tlsConfig: + caCertPath: /var/run/secrets/stackrox.io/certs/ca.pem + clientCertPath: /var/run/secrets/stackrox.io/certs/cert.pem + clientKeyPath: /var/run/secrets/stackrox.io/certs/key.pem +``` + +## Build System + +See [build.md](build.md) for details. + +**Builder Image** +CentOS Stream 10 container with build tools, third-party dependencies built from source. Multi-arch support (amd64, arm64, ppc64le, s390x). + +**Build Process** +1. Pull/build collector-builder image +2. CMake configure (find packages, set flags, configure falcosecurity-libs) +3. Compile C++ sources (parallel, ~8-12 min) +4. Strip binary (Release mode) +5. Build container image (UBI 10 Minimal) + +**Quick Build**: +```bash +make start-builder +make collector +make image +``` + +**Incremental**: +```bash +make collector # ~2-5 min +make image # ~30 sec +``` + +## Testing + +See [integration-tests.md](integration-tests.md) for details. + +**Framework** +Go-based testify/suite with 26 test suites. Validates collector across platforms, kernels, runtimes. + +**Test Categories** +- Process/execution: lifecycle, lineage, symlinks, threads +- Network: connections, endpoints, UDP, async, afterglow +- Procfs: scraping, pre-existing connections, listening ports +- Configuration: runtime reload, startup, log levels +- Performance: benchmarks, profiling, resource usage +- API: introspection endpoints, Prometheus metrics +- Kubernetes: namespaces, ConfigMap reload + +**Mock Sensor** +Simulated Sensor gRPC server. Receives signals, provides query APIs. Tests verify expected events received. + +**Execution**: +```bash +cd integration-tests +make TestProcessNetwork # Single suite +make ci-integration-tests # All tests +make ci-benchmarks # Performance +``` + +**Remote Testing**: +```bash +REMOTE_HOST_TYPE=gcloud \ +VM_CONFIG=ubuntu.ubuntu-20.04 \ +COLLECTOR_TEST=TestProcessNetwork \ +ansible-playbook -i dev integration-tests.yml +``` + +## Deployment + +See [deployment.md](deployment.md) for details. + +**Kubernetes** +DaemonSet on every node. Privileged container with hostPID, hostNetwork. Mounts /host (host filesystem), /sys (sysfs), /sys/kernel/debug (debugfs for eBPF), /var/run/docker.sock (container introspection). + +**Resource Requirements** +- Requests: 50m CPU, 320Mi memory +- Limits: 2 CPU, 2Gi memory + +**Security Context** +Privileged required for BPF operations (CAP_BPF, CAP_PERFMON on kernel >= 5.8, otherwise CAP_SYS_ADMIN). + +**ConfigMap** +runtime_config.yaml mounted at /etc/collector/. Inotify watches for changes, reloads without restart. + +**TLS** +Mutual TLS with Sensor. Certificates from Secret mounted at /var/run/secrets/stackrox.io/certs/. + +**DaemonSet Example**: +```yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: collector + namespace: stackrox +spec: + selector: + matchLabels: + app: collector + template: + spec: + hostPID: true + hostNetwork: true + containers: + - name: collector + image: quay.io/stackrox-io/collector:latest + env: + - {name: COLLECTION_METHOD, value: "EBPF"} + - {name: GRPC_SERVER, value: "sensor.stackrox.svc:443"} + securityContext: + privileged: true + volumeMounts: + - {name: host-root, mountPath: /host, readOnly: true} + - {name: sys, mountPath: /sys, readOnly: true} + - {name: certs, mountPath: /var/run/secrets/stackrox.io/certs/} + volumes: + - {name: host-root, hostPath: {path: /}} + - {name: sys, hostPath: {path: /sys}} + - {name: certs, secret: {secretName: collector-tls}} +``` + +## C++ Codebase + +See [lib/README.md](lib/README.md) for component index. + +**Location:** `collector/lib/` +**Lines:** ~16,521 C++ across 108 files + +**Key Modules** + +Event Processing: +- SystemInspectorService: Main event loop +- EventExtractor: libsinsp wrapper +- SignalHandler: Handler interface + +Signal Handlers: +- NetworkSignalHandler: Network event consumer +- ProcessSignalFormatter: Process event formatter +- ProcessSignalHandler: Process handler wrapper + +Connection Management: +- ConnTracker: Afterglow aggregation +- ConnScraper: /proc/net scanner +- Afterglow: Generic expiration container +- NetworkConnection: 5-tuple representation + +Process Management: +- ProcessStore: Process cache +- ContainerMetadata: Container info extractor + +gRPC: +- CollectorService: gRPC implementation +- GRPCUtil: Connection helpers +- DuplexGRPC: Bidirectional wrapper + +Configuration: +- CollectorConfig: YAML parser +- HostInfo: Host metadata + +Kernel: +- KernelDriver: ModernBPFDriver (CO-RE BPF loader) + +Utilities: +- Logging, Utility, StoppableThread + +**Code Organization** + +Headers: Public interfaces in .h files +Implementation: Logic in .cpp files +Tests: *_test.cpp files with GoogleTest +Mocks: Mock* files for testing + +**Build Integration** + +collector/lib/CMakeLists.txt defines collector_lib target. Links against: +- libsinsp, libscap (falcosecurity-libs) +- gRPC, protobuf +- yaml-cpp, jsoncpp +- civetweb, prometheus-cpp +- gperftools (optional profiling) + +## falcosecurity-libs Integration + +See [falcosecurity-libs.md](falcosecurity-libs.md) for details. + +**Submodule** +git submodule at falcosecurity-libs/. StackRox fork of upstream Falco libs (github.com/stackrox/falcosecurity-libs). + +**Version** +3.21.6. Branch: module-version-2.10. + +**Customizations** +- BPF verifier fixes for clang > 19 (ROX-31971) +- Container-Optimized OS compatibility (ROX-24938) +- getsockopt syscall support (ROX-18856) +- UDP connectionless tracking +- PowerPC support (ppc64le) +- Kernel 6.x compatibility + +**Architecture Layers** + +libsinsp (userspace/libsinsp/): +Event enrichment, container metadata, filter engine. Provides sinsp inspector class. Tracks thread table, FD table, network connections. + +libscap (userspace/libscap/): +Ring buffer management, event parsing. Reads per-CPU BPF ring buffers, parses ppm_evt_hdr structures. + +Modern BPF Driver (driver/modern_bpf/): +CO-RE BPF programs. Attached to sys_enter, sys_exit, sched_process_exit tracepoints. Built as bpf_probe.skel.h embedded in binary. + +**Collector Usage** + +SystemInspectorService creates sinsp instance: +```cpp +inspector->open_modern_bpf(); +while (running) { + sinsp_evt* evt = inspector->next(&res); + // Dispatch to NetworkSignalHandler or ProcessSignalHandler +} +``` + +NetworkSignalHandler extracts connection tuples via EventExtractor wrapper. + +ProcessSignalHandler builds lineage from sinsp thread table. + +ContainerMetadata wraps `sinsp::get_container_manager()`. + +ModernBPFDriver manages probe lifecycle, references g_syscall_table from libscap. + +**Configuration** + +From collector/CMakeLists.txt: +```cmake +set(BUILD_LIBSCAP_MODERN_BPF ON) +set(BUILD_DRIVER OFF) # No kernel module +set(SINSP_SLIM_THREADINFO ON) +set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep)$") + +add_definitions(-DSCAP_SOCKET_ONLY_FD) +``` + +Tunables: +- sinsp_cpu_per_buffer_: CPUs per buffer +- sinsp_total_buffer_size_: Total ring buffer (512 MB) +- sinsp_thread_cache_size_: Thread cache (32768) + +## Performance + +**Ring Buffer Sizing** +Default 512 MB total. Per-CPU buffers sized automatically or via sinsp_cpu_per_buffer_. Too small causes event drops (SCAP_DROP counter), too large causes memory pressure. + +**Event Filtering** +Early kernel-side filtering reduces userspace load. SCAP_SOCKET_ONLY_FD processes socket FDs only. Limited cgroup subsystems tracked. + +**Slim Threadinfo** +Reduces memory ~60% by omitting env vars, full FD snapshots. Trade-off: less detail in process signals. + +**Afterglow Efficiency** +Avoids reporting duplicate connections. Default 30s period balances accuracy vs bandwidth. Longer period = more deduplication but stale data risk. + +**Procfs Scraping** +Default 15s interval. Lower = more current endpoint data, higher CPU. Higher = less overhead, delayed discovery. + +**Connection Aggregation** +Batch sends reduce gRPC overhead. Default 15s aggregation window. Configured via connectionStats.aggregationWindow. + +## Debugging + +**Log Levels** +Set COLLECTOR_LOG_LEVEL=trace for verbose output. Logs to stdout (captured by K8s). + +**Introspection Endpoints** +Enable via ROX_COLLECTOR_INTROSPECTION_ENABLE=true: +- /ready: Health check +- /state/network/connection: Active connections +- /state/network/endpoint: Listening endpoints +- /state/log-level: Dynamic log level (GET/POST) +- /metrics: Prometheus metrics + +**BPF Debug Mode** +Build with BPF_DEBUG_MODE=ON. Enables bpf_printk() in BPF programs. Logs to /sys/kernel/debug/tracing/trace_pipe. + +**Core Dumps** +Set kernel.core_pattern, enable coredumpctl. Debug with gdb using image-dev (unstripped binary). + +**gRPC Tracing** +Enable via GRPC_TRACE=all GRPC_VERBOSITY=debug. Logs gRPC internals. + +**Performance Profiling** +gperftools integration (x86_64): +- CPU: CPUPROFILE=/tmp/cpu.prof +- Heap: HEAPPROFILE=/tmp/heap.prof + +Endpoints (introspection enabled): +- /debug/pprof/profile?seconds=30 +- /debug/pprof/heap + +**Capture Files** +libscap can record events to .scap files for offline analysis. Useful for reproducing issues. + +## Security + +**Kernel Access** +Privileged container required. BPF operations need CAP_BPF/CAP_PERFMON (kernel >= 5.8) or CAP_SYS_ADMIN (older kernels). + +**Event Data** +Syscall arguments may contain sensitive data (file paths, network addresses, cmdline args). Collector does not filter sensitive data, relies on Sensor/Central for redaction. + +**BPF Safety** +Modern BPF verifier ensures programs cannot crash kernel. Probes read-only, cannot modify kernel state. + +**Ring Buffer Access** +Restricted to collector process (privileged). Kernel ensures data integrity, prevents tampering. + +**TLS** +Mutual TLS with Sensor. Certificates provided via K8s Secret. Validates Sensor identity, encrypts communication. + +## Known Limitations + +**Kernel Requirements** +CO-RE BPF requires kernel >= 5.8 with BTF. Older kernels unsupported. + +**Container Runtime** +Requires Docker socket or CRI runtime access for container metadata. Standalone processes on host have limited context. + +**Network Monitoring** +Tracks syscall-level events only. Cannot see kernel-bypassed networking (e.g., DPDK, kernel TLS offload). UDP connectionless traffic limited tracking. + +**Process Context** +Short-lived processes may exit before collector reads /proc info. Lineage tracking depends on parent still running. + +**Resource Overhead** +Privileged container, kernel instrumentation. Typical overhead: 50m-100m CPU, 320Mi-512Mi memory (varies by workload). + +**Event Loss** +Under extreme load (e.g., 100k+ events/sec), ring buffers may overflow. SCAP_DROP counter tracks losses. Increase buffer size or reduce event volume. + +## Troubleshooting + +**Collector Not Starting** + +Check pod events: +```bash +kubectl -n stackrox describe pod collector-xxxxx +``` + +Check logs: +```bash +kubectl -n stackrox logs collector-xxxxx +``` + +Common issues: +- Kernel too old (< 5.8): CO-RE BPF unavailable +- BTF missing: Check /sys/kernel/btf/vmlinux +- Permissions: Verify privileged: true, hostPID: true +- gRPC server unreachable: Check GRPC_SERVER, network policies + +**Events Not Captured** + +Check collector logs for errors. Verify BPF probe loaded: +```bash +bpftool prog list | grep collector +``` + +Check ring buffer stats: +```bash +curl localhost:8080/state/stats # If introspection enabled +``` + +Look for SCAP_DROP counter. If non-zero, increase SINSP_TOTAL_BUFFER_SIZE. + +Verify events generated (run test workload, check logs for event counts). + +**High Memory Usage** + +Check thread cache size (SINSP_THREAD_CACHE_SIZE). Reduce if many processes. + +Enable slim threadinfo (SINSP_SLIM_THREADINFO=ON in build). + +Reduce ring buffer size if memory pressure (but may increase drops). + +**High CPU Usage** + +Check event rate (SCAP_EVTS counter in stats). If very high, consider filtering. + +Increase procfs scrape interval (scrape.interval). + +Reduce connection stats aggregation window (networking.connectionStats.aggregationWindow). + +**Connection Deduplication Issues** + +Verify afterglow period (networking.afterglow.period). If too short, connections reported multiple times. If too long, stale data. + +Check connection logs (COLLECTOR_LOG_LEVEL=debug). Look for afterglow evictions, reinsertions. + +**Process Lineage Missing** + +Parent process may have exited before collector read /proc. Lineage truncated at missing parent. + +Increase thread cache size to retain more process info. + +**gRPC Connection Failures** + +Check TLS certificates: +```bash +kubectl -n stackrox get secret collector-tls -o yaml +``` + +Verify ca.pem, cert.pem, key.pem present. + +Check Sensor connectivity: +```bash +kubectl -n stackrox exec collector-xxxxx -- curl -k https://sensor.stackrox.svc:443 +``` + +Review gRPC logs (GRPC_TRACE=all). + +## Version History + +**3.74.x** (2026) +ROX-31971: Clang > 19 support, BPF verifier fixes. CentOS Stream 10 builder migration. Kernel 6.15 compatibility. + +**3.21.x** (2024) +falcosecurity-libs 3.21.0 integration. Modern BPF excludes (openat2, ppoll, setsockopt). RHEL 9.3 fixes. + +**3.18.x** (2023) +UDP connectionless tracking improvements. PowerPC support (ppc64le). Kernel 6.7 compatibility. + +**3.0.x** (2022) +ROX-7482: Migration from sysdig to falcosecurity-libs. CO-RE BPF enablement. Afterglow algorithm introduction. + +**2.x** (2021) +Original sysdig-based implementation. Classic eBPF probes. + +## Contributing + +**Code Style** +C++17 standard. Follow Google C++ Style Guide. Use clang-format (make -C collector format). + +**Testing** +Add unit tests (*_test.cpp) for new C++ code. Add integration tests for new features. Verify across platforms (RHEL, Ubuntu, COS). + +**Pull Requests** +Fork collector repo. Create feature branch. Add tests, update docs. Submit PR to stackrox/collector. + +**Build Locally**: +```bash +make start-builder +make collector CMAKE_BUILD_TYPE=Debug +make image-dev +``` + +**Run Tests**: +```bash +cd integration-tests +make TestProcessNetwork +``` + +## Support + +**Issues** +File Jira tickets in StackRox project (ROX-xxxxx). Include collector version, kernel version, platform, logs. + +**Logs** +Collect via: +```bash +kubectl -n stackrox logs collector-xxxxx > collector.log +``` + +Enable debug logging (COLLECTOR_LOG_LEVEL=debug). + +**Debugging** +Use image-dev for symbols. Attach gdb, analyze core dumps. Use introspection endpoints for live state. + +## References + +**Documentation** +- [lib/README.md](lib/README.md) - C++ library components +- [falcosecurity-libs.md](falcosecurity-libs.md) - BPF driver details +- [integration-tests.md](integration-tests.md) - Test framework +- [build.md](build.md) - Build system +- [deployment.md](deployment.md) - Deployment automation +- [ebpf-architecture.md](ebpf-architecture.md) - CO-RE BPF deep dive + +**External** +- [Falco Documentation](https://falco.org/docs/) +- [eBPF Documentation](https://docs.kernel.org/bpf/) +- [CO-RE (Compile Once - Run Everywhere)](https://nakryiko.com/posts/bpf-portability-and-co-re/) +- [StackRox Documentation](https://docs.stackrox.com/) + +**Repositories** +- [Collector](https://github.com/stackrox/collector) +- [falcosecurity-libs fork](https://github.com/stackrox/falcosecurity-libs) +- [Upstream Falco libs](https://github.com/falcosecurity/libs) diff --git a/docs/build.md b/docs/build.md new file mode 100644 index 0000000000..6f803cbfcc --- /dev/null +++ b/docs/build.md @@ -0,0 +1,427 @@ +# Build System + +## Overview + +Multi-stage build combining CMake, Make, Docker, and vcpkg. Produces static C++ binary with CO-RE BPF probes. Supports amd64, arm64, ppc64le, s390x. + +**Environment:** CentOS Stream 10 container (collector-builder) +**Output:** Static binary + CO-RE BPF probes +**Build time:** ~30-45 min (full), ~5 min (incremental) + +## Build Flow + +``` +1. Builder Image (builder/Dockerfile) + ├── Base: CentOS Stream 10 + ├── System packages: clang, cmake, gcc, etc. + ├── Third-party deps from source: + │ abseil, gperftools, protobuf, yaml-cpp + │ grpc, civetweb, prometheus-cpp, jsoncpp + │ tbb, libbpf, bpftool, valijson, uthash + └── Workspace: /src + +2. CMake Configuration (CMakeLists.txt) + ├── Find packages (gRPC, yaml-cpp, civetweb, prometheus-cpp) + ├── Compiler flags (C++17, -fPIC, -pthread) + ├── Build types (Debug, Release) + └── falcosecurity-libs (CO-RE BPF) + +3. Collector Build (collector/Makefile) + ├── cmake-configure: Generate build files + ├── cmake-build: Compile C++ sources + ├── Strip binary (Release) + └── Copy to container/bin/ + +4. Container Image (collector/container/Dockerfile) + ├── Base: UBI 10 Minimal + ├── Runtime dependencies + ├── Collector binary + └── ENTRYPOINT: collector +``` + +## Builder Image + +**Location:** `builder/Dockerfile` +**Base:** `quay.io/centos/centos:stream10` + +System packages: clang, llvm, gcc, gcc-c++, cmake, make, autoconf, automake, libtool, glibc-devel, libcurl-devel, openssl-devel, binutils-devel, elfutils-libelf-devel, gdb, valgrind, libasan, libubsan, systemtap-sdt-devel. + +Third-party dependencies built in `builder/install/` (execution order by prefix): +- 05: abseil (Google base library) +- 10: gperftools, protobuf, yaml-cpp +- 20: googletest +- 30: c-ares (async DNS) +- 35: re2 (regex) +- 40: civetweb (HTTP server), grpc +- 50: libb64, prometheus-cpp +- 60: jsoncpp, tbb +- 70: valijson, uthash +- 80: libbpf +- 90: bpftool + +Versions in `builder/install/versions.sh`: +```bash +ABSEIL_VERSION="20240722.0" +GPERFTOOLS_VERSION="2.16" +PROTOBUF_VERSION="v28.3" +GRPC_VERSION="v1.67.0" +CIVETWEB_VERSION="v1.16" +``` + +Build debug builder (retains sources, debug symbols): +```bash +make builder BUILD_BUILDER_IMAGE=true COLLECTOR_BUILDER_DEBUG=true +``` + +## CMake Configuration + +**Root:** `CMakeLists.txt` → `add_subdirectory(collector)` +**Collector:** `collector/CMakeLists.txt` + +Package discovery: +```cmake +find_package(Threads) +find_package(CURL REQUIRED) +find_package(yaml-cpp REQUIRED) +find_package(gRPC CONFIG REQUIRED) +find_package(civetweb CONFIG REQUIRED) +find_package(prometheus-cpp CONFIG REQUIRED) +``` + +Compiler flags: +```cmake +set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} \ + -fPIC -Wall --std=c++17 -pthread \ + -Wno-deprecated-declarations \ + -fno-omit-frame-pointer -rdynamic") +``` + +Build types: +- Debug: `-g -ggdb -D_DEBUG` +- Release: `-O3 -fno-strict-aliasing -DNDEBUG` + +Sanitizers: +- ADDRESS_SANITIZER: `-fsanitize=address -fsanitize=undefined` +- THREAD_SANITIZER: `-fsanitize=thread` + +Profiling: `DISABLE_PROFILING=OFF` enables `-DCOLLECTOR_PROFILING`. + +Version injection: +```cmake +configure_file( + lib/CollectorVersion.h.in + ${CMAKE_CURRENT_BINARY_DIR}/CollectorVersion.h) +``` + +## falcosecurity-libs Integration + +```cmake +set(FALCO_DIR ${PROJECT_SOURCE_DIR}/../falcosecurity-libs) +add_subdirectory(${FALCO_DIR} falco) + +set(BUILD_DRIVER OFF) # No kernel module +set(USE_BUNDLED_DEPS OFF) # System deps +set(BUILD_LIBSCAP_GVISOR OFF) # No gVisor +set(SINSP_SLIM_THREADINFO ON) # Optimize memory +set(BUILD_SHARED_LIBS OFF) # Static linking +set(BUILD_LIBSCAP_MODERN_BPF ON) # CO-RE BPF +set(MODERN_BPF_DEBUG_MODE ${BPF_DEBUG_MODE}) +set(MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep)$") +``` + +Definitions: +```cmake +add_definitions(-DSCAP_SOCKET_ONLY_FD) +add_definitions("-DINTERESTING_SUBSYS=\"perf_event\", \"cpu\", \"cpuset\", \"memory\"") +set(SCAP_HOST_ROOT_ENV_VAR_NAME "COLLECTOR_HOST_ROOT") +``` + +## Build Targets + +**collector:** Main binary (links collector_lib) +**connscrape:** Connection scraping tool +**self-checks:** Startup validation +**collector_lib:** Core library from lib/ + +## Makefile System + +**Root:** `Makefile` (includes `Makefile-constants.mk`) + +Key targets: +- `tag`: Print git tag +- `builder`: Pull/build collector-builder +- `collector`: Build binary +- `image`: Build container image +- `image-dev`: Build with ubi (debugging) +- `unittest`: Run C++ unit tests +- `ci-integration-tests`: Run integration tests + +Variables (`Makefile-constants.mk`): +```makefile +COLLECTOR_BUILDER_TAG ?= master +COLLECTOR_TAG ?= $(git describe --tags --abbrev=10 --dirty) +HOST_ARCH := $(uname -m | sed -e 's/x86_64/amd64/' -e 's/aarch64/arm64/') +PLATFORM ?= linux/$(HOST_ARCH) + +CMAKE_BUILD_TYPE ?= Release +USE_VALGRIND ?= false +ADDRESS_SANITIZER ?= false +THREAD_SANITIZER ?= false +BPF_DEBUG_MODE ?= false +``` + +**Collector Makefile:** `collector/Makefile` + +cmake-configure: +```makefile +cmake-configure/collector: + docker exec $(COLLECTOR_BUILDER_NAME) \ + cmake -S $(BASE_PATH) -B $(CMAKE_DIR) \ + -DCMAKE_BUILD_TYPE=$(CMAKE_BUILD_TYPE) \ + -DADDRESS_SANITIZER=$(ADDRESS_SANITIZER) \ + -DBPF_DEBUG_MODE=$(BPF_DEBUG_MODE) \ + -DCOLLECTOR_VERSION=$(COLLECTOR_VERSION) +``` + +cmake-build: +```makefile +cmake-build/collector: cmake-configure/collector + docker exec $(COLLECTOR_BUILDER_NAME) \ + cmake --build $(CMAKE_DIR) -- -j $(NPROCS) + docker exec $(COLLECTOR_BUILDER_NAME) \ + bash -c "[ $(CMAKE_BUILD_TYPE) == Release ] && \ + strip --strip-unneeded $(COLLECTOR_BIN_DIR)/collector || exit 0" +``` + +Binary extraction: +```makefile +container/bin/collector: cmake-build/collector + mkdir -p container/bin + cp "$(COLLECTOR_BIN_DIR)/collector" container/bin/collector + cp "$(COLLECTOR_BIN_DIR)/self-checks" container/bin/self-checks +``` + +## Container Image + +**Dockerfile:** `collector/container/Dockerfile` + +```dockerfile +FROM registry.access.redhat.com/ubi10/ubi-minimal:latest + +ARG BUILD_TYPE=rhel # or devel +ENV COLLECTOR_HOST_ROOT=/host + +COPY container/${BUILD_TYPE}/install.sh / +RUN ./install.sh && rm -f install.sh + +COPY container/THIRD_PARTY_NOTICES/ /THIRD_PARTY_NOTICES/ +COPY container/bin/collector /usr/local/bin/ +COPY container/bin/self-checks /usr/local/bin/self-checks +COPY container/status-check.sh /usr/local/bin/status-check.sh + +EXPOSE 8080 9090 + +HEALTHCHECK --start-period=5s --interval=5s \ + CMD /usr/local/bin/status-check.sh + +ENTRYPOINT ["collector"] +``` + +Build types: +- **rhel:** Production (UBI 10 Minimal, stripped binary) +- **devel:** Development (UBI 10 full, debugging tools, unstripped) + +Multi-arch: +```bash +PLATFORM=linux/amd64 make image +PLATFORM=linux/arm64 make image +PLATFORM=linux/ppc64le make image +PLATFORM=linux/s390x make image + +docker buildx build --platform linux/amd64,linux/arm64 \ + -t quay.io/stackrox-io/collector:$(COLLECTOR_TAG) \ + collector/container/ +``` + +## CO-RE BPF Probes + +Built-in to binary (no external files). Generated during falcosecurity-libs build: + +```cpp +// From bpf_probe.skel.h +static const unsigned char modern_bpf_probe[] = { + 0x7f, 0x45, 0x4c, 0x46, // ELF header + // ... BPF bytecode +}; +``` + +## Build Workflows + +Local development: +```bash +make start-builder # Start builder container +make collector # Build binary +make image # Build container +docker run --rm --privileged \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v /host:/host:ro \ + -e GRPC_SERVER=localhost:9999 \ + quay.io/stackrox-io/collector:$(COLLECTOR_TAG) +``` + +Incremental: +```bash +make collector # ~2-5 min +make image # ~30 sec +``` + +Clean: +```bash +make clean +make teardown-builder +make builder BUILD_BUILDER_IMAGE=true +make start-builder +make collector +``` + +CI multi-arch: +```bash +cd ansible +ansible-playbook ci-build-builder.yml \ + -e arch=amd64 \ + -e stackrox_io_username=$QUAY_USER \ + -e stackrox_io_password=$QUAY_PASS + +ansible-playbook ci-build-collector.yml \ + -e arch=amd64 \ + -e collector_image=quay.io/stackrox-io/collector:$(COLLECTOR_TAG) +``` + +Cross-compilation (amd64 → arm64): +```bash +docker buildx create --use +docker buildx build --platform linux/arm64 \ + -t quay.io/stackrox-io/collector-builder:master-arm64 \ + -f builder/Dockerfile --load builder/ + +PLATFORM=linux/arm64 HOST_ARCH=arm64 make collector +``` + +## Performance + +Timings (modern hardware): +| Task | Clean | Incremental | +|------|-------|-------------| +| Builder image | 30-45 min | N/A | +| CMake configure | 1-2 min | 5 sec | +| Collector compile | 8-12 min | 30 sec - 2 min | +| Container image | 1-2 min | 30 sec | + +Parallelism: +```bash +make collector NPROCS=$(nproc) # All cores +make collector NPROCS=4 # Limit (reduce memory) +``` + +Caching: Docker layer cache, CMake cache, incremental linking. + +## Troubleshooting + +Builder container: +```bash +docker ps -a | grep collector_builder # Check exists +make teardown-builder # Restart +make start-builder +docker system prune -a # Disk space +``` + +CMake errors: +```bash +docker exec collector_builder_amd64 \ + cmake -S /src -B /src/cmake-build --debug-find + +docker exec collector_builder_amd64 \ + cmake -S /src -B /src/cmake-build \ + -DCMAKE_C_COMPILER=clang \ + -DCMAKE_CXX_COMPILER=clang++ +``` + +Compilation: +- Undefined reference: check link order in `collector/lib/CMakeLists.txt`, verify `target_link_libraries()` +- Header not found: verify `include_directories()`, check `git submodule update --init --recursive` + +Binary issues: +```bash +make image-dev CMAKE_BUILD_TYPE=Debug # Debug symbols +docker exec -it collector gdb /usr/local/bin/collector +file container/bin/collector # Check stripped +ldd container/bin/collector # Check libs +``` + +Container build: +```bash +docker pull registry.access.redhat.com/ubi10/ubi-minimal:latest +ls -la collector/container/bin/collector # Verify exists +cat collector/container/.dockerignore # Check ignore +``` + +## Advanced Options + +Custom builder: +```bash +docker build -t my-collector-builder \ + --build-arg BASE_IMAGE=ubuntu:22.04 \ + -f builder/Dockerfile builder/ +``` + +Custom falcosecurity-libs: +```bash +cd falcosecurity-libs +git remote add myfork https://github.com/myuser/falcosecurity-libs +git fetch myfork +git checkout myfork/my-feature +cd .. +make collector +``` + +Static analysis: +```bash +make -C collector check # clang-format check +make -C collector format # clang-format fix +docker exec collector_builder_amd64 \ + clang-tidy collector/lib/*.cpp -- -I... +``` + +Code coverage: +```bash +cmake -B build -DCMAKE_BUILD_TYPE=Debug \ + -DCMAKE_CXX_FLAGS="-fprofile-arcs -ftest-coverage" +cmake --build build +ctest --test-dir build +gcov build/collector/lib/*.o +lcov --capture --directory build --output-file coverage.info +genhtml coverage.info --output-directory coverage-html +``` + +## Key Files + +| File | Purpose | +|------|---------| +| Makefile | Root build orchestration | +| Makefile-constants.mk | Build variables | +| CMakeLists.txt | Root CMake | +| collector/Makefile | Collector build | +| collector/CMakeLists.txt | Collector CMake | +| builder/Dockerfile | Builder image | +| builder/install/*.sh | Dependency builds | +| builder/install/versions.sh | Version pinning | +| collector/container/Dockerfile | Runtime image | +| vcpkg.json | vcpkg manifest | + +## References + +- [CMake Documentation](https://cmake.org/documentation/) +- [gRPC C++ Build](https://grpc.io/docs/languages/cpp/quickstart/) +- [falcosecurity-libs](https://github.com/falcosecurity/libs) +- [Collector Architecture](architecture.md) +- [Integration Tests](integration-tests.md) diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 0000000000..532b30c7f8 --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,624 @@ +# Deployment + +## Overview + +Ansible playbooks orchestrate VM lifecycle, environment provisioning, and test execution across cloud platforms and operating systems. + +**Tool:** Ansible 2.9+ +**Platforms:** GCP, IBM Cloud (Power/Z), Local VMs +**OS:** RHEL, CentOS, Ubuntu, SLES, Flatcar, Fedora CoreOS, Container-Optimized OS, Garden Linux +**Runtimes:** Docker, Podman, CRI-O, containerd + +## Deployment Flow + +``` +Ansible Playbook + ↓ +1. create-all-vms (GCP/IBM Cloud API) + ├── Compute instances + ├── Networking (VPC, subnets) + └── SSH keys / Floating IPs + ↓ +2. provision-vm + ├── Install Python (Flatcar/CoreOS) + ├── Update packages + ├── Install Docker/Podman + ├── Configure SELinux + └── Pull collector images + ↓ +3. run-test-target + ├── Login to registry + ├── Execute tests + ├── Collect logs + └── Save results + ↓ +4. destroy-vm (cleanup) + ├── Delete instances + └── Release floating IPs +``` + +## Directory Structure + +``` +ansible/ +├── ci/ # CI inventory +│ ├── gcp.yml # GCP dynamic inventory +│ └── ibmcloud.yml # IBM Cloud dynamic +├── dev/ # Dev inventory +├── roles/ # Ansible roles +│ ├── create-vm/ +│ ├── create-all-vms/ +│ ├── provision-vm/ +│ ├── run-test-target/ +│ └── destroy-vm/ +├── group_vars/ +│ ├── all.yml # VM definitions +│ ├── platform_flatcar.yml +│ └── platform_*.yml +├── integration-tests.yml +├── k8s-integration-tests.yml +├── benchmarks.yml +├── vm-lifecycle.yml +├── ci-build-builder.yml +└── ci-build-collector.yml +``` + +## Playbooks + +### integration-tests.yml + +End-to-end: create VMs → provision → test → destroy. + +Tags: +- setup: Create VMs +- provision: Provision VMs +- run-tests: Execute tests +- teardown: Destroy VMs + +Usage: +```bash +VM_TYPE=rhel ansible-playbook -i dev integration-tests.yml +ansible-playbook -i dev integration-tests.yml --tags provision +ansible-playbook -i dev integration-tests.yml --tags run-tests +``` + +Variables: +- VM_TYPE: VM family (rhel, ubuntu-os, cos) +- JOB_ID: Unique identifier ($USER) +- COLLECTOR_TEST: Test target (ci-integration-tests) +- GCP_SSH_KEY_FILE: SSH key (~/.ssh/google_compute_engine) + +### k8s-integration-tests.yml + +Tests on Kubernetes (KinD or existing cluster). + +Variables: +- tester_image: Collector test image (required) +- collector_image: Collector image (required) +- collector_root: Repo path (required) +- cluster_name: KinD cluster (collector-tests) +- container_engine: Docker or podman (docker) + +Tags: +- test-only: Skip KinD creation/deletion +- cleanup: Clean K8s resources + +Usage: +```bash +cat > k8s-vars.yml <, platform_. + +Usage: +```yaml +hosts: "job_id_{{ lookup('env', 'JOB_ID') }}" +hosts: platform_rhel +hosts: "job_id_{{ job_id }}:&platform_rhel" +``` + +## VM Definitions + +Location: `group_vars/all.yml` + +Structure: +```yaml +vm_definitions: + : + cloud: gcp | ibmcloud + project: | + families: + - name: + image_family: + machine_type: + zone: + runtime: docker | podman | crio +``` + +RHEL (GCP): +```yaml +rhel: + cloud: gcp + project: stackrox-collector-ci + families: + - name: rhel-7 + image_family: rhel-7 + image_project: rhel-cloud + machine_type: n1-standard-2 + zone: us-central1-a + runtime: docker + - name: rhel-8 + image_family: rhel-8 + machine_type: n1-standard-2 + runtime: docker +``` + +s390x (IBM Cloud): +```yaml +rhel-s390x: + cloud: ibmcloud + region: us-east + families: + - name: rhel-8-6-s390x + profile: bz2-1x4 + image_id: r014-12345678-abcd-1234 + vpc_id: r014-vpc-id + zone: us-east-1 + runtime: podman +``` + +Naming: `--` (e.g., collector-dev-rhel-8-jdoe). + +## Container Configuration + +Runtime selection priority: +1. VM family: vm_family.runtime +2. VM type default: vm_type.default_runtime +3. Platform group: runtime_command in platform_*.yml +4. Global: docker + +Platform defaults: +- cos: docker (pre-installed) +- flatcar: docker (native) +- fedora-coreos: podman (systemd) +- rhel-8+: podman (default) +- rhel-7: docker (compatibility) +- ubuntu/sles/garden: docker + +Privileges: +```yaml +docker_run_args: + - --privileged + - --pid=host + - --network=host + - -v /var/run/docker.sock:/var/run/docker.sock + - -v /host:/host:ro + - -v /sys/kernel/debug:/sys/kernel/debug +``` + +Required for: host PID (process visibility), host network (monitoring), kernel debug FS (eBPF maps), Docker socket (introspection), host filesystem (module loading). + +## Kubernetes Deployment + +DaemonSet: +```yaml +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: collector + namespace: stackrox +spec: + selector: + matchLabels: + app: collector + template: + spec: + hostPID: true + hostNetwork: true + containers: + - name: collector + image: quay.io/stackrox-io/collector:latest + env: + - name: COLLECTION_METHOD + value: "EBPF" + - name: GRPC_SERVER + value: "sensor.stackrox.svc:443" + resources: + limits: {cpu: "2", memory: "2Gi"} + requests: {cpu: "50m", memory: "320Mi"} + securityContext: + privileged: true + volumeMounts: + - {name: host-root, mountPath: /host, readOnly: true} + - {name: sys, mountPath: /sys, readOnly: true} + - {name: certs, mountPath: /var/run/secrets/stackrox.io/certs/} + volumes: + - {name: host-root, hostPath: {path: /}} + - {name: sys, hostPath: {path: /sys}} + - {name: certs, secret: {secretName: collector-tls}} +``` + +ConfigMap: +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: runtime-config +data: + runtime_config.yaml: | + networking: + connectionStats: + aggregationWindow: 15s + afterglow: + period: 30s + processesListening: + enable: true + scrape: + interval: 15s +``` + +Mount: +```yaml +volumeMounts: +- {name: runtime-config, mountPath: /etc/collector/} +volumes: +- {name: runtime-config, configMap: {name: runtime-config}} +``` + +## CI/CD + +CircleCI (`.circleci/config.yml`): +```yaml +workflows: + test: + jobs: + - build-builder: + matrix: + parameters: + arch: [amd64, arm64, ppc64le, s390x] + - integration-tests: + requires: [build-collector] + matrix: + parameters: + vm_type: [rhel, ubuntu-os, cos, flatcar] +``` + +Job: +```yaml +integration-tests-rhel-ebpf: + docker: + - image: quay.io/ansible/ansible-runner:latest + environment: + VM_TYPE: rhel + COLLECTION_METHOD: ebpf + steps: + - checkout + - run: + name: Setup GCP + command: echo $GCP_KEY | base64 -d > /tmp/gcp-key.json + - run: cd ansible && ansible-playbook -i ci integration-tests.yml +``` + +GitHub Actions (IBM Z/Power): +```yaml +name: Integration Tests (ppc64le) +on: + push: + branches: [master, release-*] +jobs: + test-ppc64le: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - name: IBM Cloud CLI + run: | + curl -fsSL https://clis.cloud.ibm.com/install/linux | sh + ibmcloud login --apikey ${{ secrets.IC_API_KEY }} + - name: Run tests + run: cd ansible && ansible-playbook -i ci integration-tests.yml + env: + VM_TYPE: rhel-ppc64le +``` + +## Security + +Quay.io (development): +```bash +cat > ansible/secrets.yml < +export IC_REGION=us-east + +# RHEL Subscription +export REDHAT_USERNAME= +export REDHAT_PASSWORD= +``` + +## Troubleshooting + +Ansible connection: +```bash +export ANSIBLE_TIMEOUT=60 # SSH timeout +ansible-playbook -e ansible_python_interpreter=/usr/bin/python3 +``` + +VM creation: +```bash +gcloud compute project-info describe --project= # GCP quota +ibmcloud is floating-ips --resource-group collector-tests # IBM Cloud IPs +ansible-playbook -i ci vm-lifecycle.yml --tags teardown # Cleanup +``` + +Test execution: +```bash +docker login quay.io -u # Manual login +docker pull quay.io/stackrox-io/collector:$(git describe --tags) +ansible-playbook -i dev integration-tests.yml --tags run-tests -vvv +``` + +KinD: +```bash +systemctl status docker # Check runtime +kind version # Verify installed +kind create cluster --name collector-tests +kind load docker-image quay.io/stackrox-io/collector:latest --name collector-tests +``` + +## Utilities + +**hotreload.sh** (`utilities/hotreload.sh`): Live reload collector binary in running container. +```bash +./utilities/hotreload.sh /path/to/new/collector-binary +``` + +**release.py** (`utilities/release.py`): Create release branches/tags, version bumping. +```bash +./utilities/release.py 4.6 --push +``` + +**tag-bumper.py**, **driver-checksum.sh**, **gardenlinux-bumper/**: Automated updates. + +## Key Files + +| File | Purpose | +|------|---------| +| ansible/integration-tests.yml | Main test playbook | +| ansible/k8s-integration-tests.yml | K8s tests | +| ansible/vm-lifecycle.yml | VM management | +| ansible/group_vars/all.yml | VM definitions | +| ansible/roles/*/tasks/main.yml | Role logic | +| ansible/ci/gcp.yml | CI inventory | +| utilities/hotreload.sh | Dev helper | + +## References + +- [Ansible Documentation](https://docs.ansible.com/) +- [GCP Compute Module](https://docs.ansible.com/ansible/latest/collections/google/cloud/) +- [IBM Cloud Collection](https://galaxy.ansible.com/ibm/cloudcollection) +- [Kubernetes DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) +- [Integration Tests](integration-tests.md) +- [Build System](build.md) diff --git a/docs/design-overview.md b/docs/design-overview.md index b599d800d1..40efe113a6 100644 --- a/docs/design-overview.md +++ b/docs/design-overview.md @@ -1,3 +1,5 @@ +> **Note:** This document has been superseded by [Architecture](architecture.md), which provides a comprehensive overview of Collector's design, data flow, and internals. + # Design overview ## Data gathering methods diff --git a/docs/ebpf-architecture.md b/docs/ebpf-architecture.md new file mode 100644 index 0000000000..2f1ad37565 --- /dev/null +++ b/docs/ebpf-architecture.md @@ -0,0 +1,1202 @@ +# eBPF Architecture + +This document explains how collector's CO-RE BPF driver works at the kernel level, including tracepoint architecture, tail call dispatch, syscall coverage, event format, and the complete path from kernel to userspace. + +**Audience:** Engineers adding syscall monitoring, debugging verifier failures, or understanding performance characteristics. + +**Prerequisites:** Basic BPF knowledge, familiarity with [falcosecurity-libs.md](./falcosecurity-libs.md) and [lib/system.md](./lib/system.md). + +## Table of Contents + +1. [CO-RE BPF Overview](#co-re-bpf-overview) +2. [Tracepoint Architecture](#tracepoint-architecture) +3. [Tail Call Dispatch](#tail-call-dispatch) +4. [Syscall Coverage](#syscall-coverage) +5. [Event Format and Ring Buffers](#event-format-and-ring-buffers) +6. [Event Flow: Kernel to Userspace](#event-flow-kernel-to-userspace) +7. [Adding New Syscall Monitoring](#adding-new-syscall-monitoring) +8. [Verifier Constraints and Pitfalls](#verifier-constraints-and-pitfalls) + +--- + +## CO-RE BPF Overview + +Collector uses **CO-RE (Compile Once, Run Everywhere) BPF**, the modern approach to kernel instrumentation. Unlike classic eBPF or kernel modules, CO-RE programs: + +- **Compile once**: BPF bytecode embedded in collector binary (`bpf_probe.skel.h` skeleton) +- **Run everywhere**: libbpf relocates struct offsets at load time using BTF (BPF Type Format) +- **No kernel headers**: Uses `vmlinux.h` generated from kernel's BTF at runtime +- **Ring buffers**: Efficient per-CPU ring buffers (`BPF_MAP_TYPE_RINGBUF`) instead of perf buffers +- **Tracing programs**: `BPF_PROG_TYPE_TRACING` attached to `tp_btf` (BTF-enabled tracepoints) + +**Requirements:** +- Kernel >= 5.8 with BTF support (`CONFIG_DEBUG_INFO_BTF=y`) +- BTF available at `/sys/kernel/btf/vmlinux` or `/boot/vmlinux-*` +- CAP_BPF + CAP_PERFMON (or CAP_SYS_ADMIN on older kernels) + +**Location:** `falcosecurity-libs/driver/modern_bpf/` + +**Build output:** Compiled to `bpf_probe.skel.h`, embedded in collector, loaded by `ModernBPFDriver::Setup()` in `collector/lib/KernelDriver.h`. + +--- + +## Tracepoint Architecture + +### Attached Programs + +The modern BPF driver attaches programs to kernel tracepoints and schedules events. These are the **entry points** from the kernel into BPF: + +**Syscall Dispatchers** (attach to all syscalls): +- `sys_enter` → `syscall_enter.bpf.c` → dispatches to per-syscall enter handlers +- `sys_exit` → `syscall_exit.bpf.c` → dispatches to per-syscall exit handlers + +**Scheduler Tracepoints** (process lifecycle): +- `sched_process_exit` → `sched_process_exit.bpf.c` → process termination (PPME_PROCEXIT_1_E) +- `sched_process_fork` → `sched_process_fork.bpf.c` → fork/clone tracking +- `sched_process_exec` → `sched_process_exec.bpf.c` → execve success path +- `sched_switch` → `sched_switch.bpf.c` → context switch (improves process tracking) + +**Other Tracepoints**: +- `signal_deliver` → `signal_deliver.bpf.c` → signal delivery +- `page_fault_user` / `page_fault_kernel` → page fault tracking + +**Attachment mechanism:** + +```c +SEC("tp_btf/sys_enter") +int BPF_PROG(sys_enter, struct pt_regs* regs, long syscall_id) { + // Dispatcher logic + bpf_tail_call(ctx, &syscall_enter_tail_table, syscall_id); + return 0; +} +``` + +`SEC("tp_btf/...")` tells libbpf to attach to a BTF-enabled tracepoint. `BPF_PROG()` macro provides type-safe arguments from the tracepoint's signature. + +**Source:** `falcosecurity-libs/driver/modern_bpf/programs/attached/` + +--- + +## Tail Call Dispatch + +### Why Tail Calls? + +BPF verifier limits programs to 1 million instructions. A single dispatcher handling 158 syscalls would exceed this. **Tail calls** (`bpf_tail_call()`) replace the current program with another, resetting the instruction count. + +### Tail Call Tables + +Three `BPF_MAP_TYPE_PROG_ARRAY` maps route execution: + +**1. syscall_enter_tail_table** +- Indexed by syscall ID (e.g., `__NR_connect = 42`) +- Maps syscall → enter event handler +- Populated by libscap during BPF skeleton load +- Example: `syscall_enter_tail_table[42] = connect_e` + +**2. syscall_exit_tail_table** +- Indexed by syscall ID +- Maps syscall → exit event handler +- Example: `syscall_exit_tail_table[42] = connect_x` + +**3. extra_syscall_calls** +- Indexed by predefined codes (T1_EXECVE_X, T2_EXECVE_X, T1_DROP_E, etc.) +- Used for multi-stage events (execve needs 3 programs due to verifier limits) +- Used for special events (hotplug, drop notifications) + +**Dispatch flow (sys_enter example):** + +```c +SEC("tp_btf/sys_enter") +int BPF_PROG(sys_enter, struct pt_regs* regs, long syscall_id) { + // 1. Filter: check if syscall is interesting + if (!syscalls_dispatcher__64bit_interesting_syscall(syscall_id)) { + return 0; // Drop uninteresting syscalls + } + + // 2. Sampling: drop events if in sampling mode + if (sampling_logic_enter(ctx, syscall_id)) { + return 0; + } + + // 3. Tail call to specific handler + bpf_tail_call(ctx, &syscall_enter_tail_table, syscall_id); + return 0; // Fallback if tail call fails +} +``` + +**Why this works:** +- Dispatcher: ~50 instructions (well within limits) +- Each syscall handler: separate program, separate instruction budget +- No shared stack between programs (tail call is a **replace**, not a call) + +**Limitations:** +- Max 33 tail calls deep (kernel limitation) +- No return from tail call (one-way jump) +- Complex events (execve) need multiple stages: `execve_x` → `t1_execve_x` → `t2_execve_x` + +**Source:** `falcosecurity-libs/driver/modern_bpf/maps/maps.h` (tail table definitions) + +--- + +## Syscall Coverage + +The modern BPF driver implements **158 syscall handlers** covering network, process, file I/O, permissions, and IPC. + +### Network Syscalls (consumed by NetworkSignalHandler) + +| Syscall | Enter Event | Exit Event | Data Captured | Collector Handler | +|---------|-------------|------------|---------------|-------------------| +| `connect` | PPME_SOCKET_CONNECT_E | PPME_SOCKET_CONNECT_X | fd, sockaddr → socktuple (src/dst IP:port) | `NetworkSignalHandler::GetConnection()` | +| `accept` | PPME_SOCKET_ACCEPT_E | PPME_SOCKET_ACCEPT_X | listen_fd → new_fd, socktuple | `NetworkSignalHandler::GetConnection()` | +| `accept4` | PPME_SOCKET_ACCEPT4_E | PPME_SOCKET_ACCEPT4_X | listen_fd, flags → new_fd, socktuple | `NetworkSignalHandler::GetConnection()` | +| `bind` | PPME_SOCKET_BIND_E | PPME_SOCKET_BIND_X | fd, sockaddr → bind address | Connection tracking (server role) | +| `listen` | PPME_SOCKET_LISTEN_E | PPME_SOCKET_LISTEN_X | fd, backlog | Connection tracking (server role) | +| `close` | PPME_SOCKET_CLOSE_E | PPME_SOCKET_CLOSE_X | fd → connection closure | `ConnTracker::UpdateConnection()` | +| `shutdown` | PPME_SOCKET_SHUTDOWN_E | PPME_SOCKET_SHUTDOWN_X | fd, how (SHUT_RD/WR/RDWR) | Connection tracking | +| `sendto` | PPME_SOCKET_SENDTO_E | PPME_SOCKET_SENDTO_X | fd, size, dest_addr → bytes_sent | Byte tracking (if enabled) | +| `recvfrom` | PPME_SOCKET_RECVFROM_E | PPME_SOCKET_RECVFROM_X | fd, size → bytes_recv, src_addr | Byte tracking (if enabled) | +| `sendmsg` | PPME_SOCKET_SENDMSG_E | PPME_SOCKET_SENDMSG_X | fd, msghdr → bytes_sent | Byte tracking (if enabled) | +| `recvmsg` | PPME_SOCKET_RECVMSG_E | PPME_SOCKET_RECVMSG_X | fd, msghdr → bytes_recv | Byte tracking (if enabled) | +| `sendmmsg` | PPME_SOCKET_SENDMMSG_E | PPME_SOCKET_SENDMMSG_X | fd, vlen → messages_sent | Batch message tracking | +| `recvmmsg` | PPME_SOCKET_RECVMMSG_E | PPME_SOCKET_RECVMMSG_X | fd, vlen → messages_recv | Batch message tracking | +| `socket` | PPME_SOCKET_SOCKET_E | PPME_SOCKET_SOCKET_X | domain, type, protocol → fd | Socket creation | +| `socketpair` | PPME_SOCKET_SOCKETPAIR_E | PPME_SOCKET_SOCKETPAIR_X | domain, type, protocol → fd[2] | Unix socket pair | +| `getsockopt` | PPME_SOCKET_GETSOCKOPT_E | PPME_SOCKET_GETSOCKOPT_X | fd, level, optname → optval | Async connect status (ROX-18856) | +| `setsockopt` | PPME_SOCKET_SETSOCKOPT_E | PPME_SOCKET_SETSOCKOPT_X | fd, level, optname, optval | Socket configuration | +| `getsockname` | PPME_SOCKET_GETSOCKNAME_E | PPME_SOCKET_GETSOCKNAME_X | fd → local sockaddr | Local address lookup | +| `getpeername` | PPME_SOCKET_GETPEERNAME_E | PPME_SOCKET_GETPEERNAME_X | fd → remote sockaddr | Peer address lookup | + +**Example: connect syscall** + +```c +// falcosecurity-libs/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/connect.bpf.c + +SEC("tp_btf/sys_enter") +int BPF_PROG(connect_e, struct pt_regs *regs, long id) { + struct auxiliary_map *auxmap = auxmap__get(); + auxmap__preload_event_header(auxmap, PPME_SOCKET_CONNECT_E); + + unsigned long args[3] = {0}; + extract__network_args(args, 3, regs); // fd, sockaddr*, addrlen + + int32_t socket_fd = (int32_t)args[0]; + auxmap__store_s64_param(auxmap, (int64_t)socket_fd); + + unsigned long sockaddr_ptr = args[1]; + uint16_t addrlen = (uint16_t)args[2]; + auxmap__store_sockaddr_param(auxmap, sockaddr_ptr, addrlen); + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); // Push to ring buffer + return 0; +} + +SEC("tp_btf/sys_exit") +int BPF_PROG(connect_x, struct pt_regs *regs, long ret) { + struct auxiliary_map *auxmap = auxmap__get(); + auxmap__preload_event_header(auxmap, PPME_SOCKET_CONNECT_X); + + unsigned long socket_fd = 0; + extract__network_args(&socket_fd, 1, regs); + + // Return code (0 = success, -EINPROGRESS = async, <0 = error) + auxmap__store_s64_param(auxmap, ret); + + // Extract socktuple (src IP:port → dst IP:port) from kernel socket struct + if (ret == 0 || ret == -EINPROGRESS) { + auxmap__store_socktuple_param(auxmap, (int32_t)socket_fd, OUTBOUND, NULL); + } else { + auxmap__store_empty_param(auxmap); + } + + auxmap__store_s64_param(auxmap, (int64_t)(int32_t)socket_fd); + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); + return 0; +} +``` + +**Userspace consumption:** + +`NetworkSignalHandler::HandleSignal()` receives `PPME_SOCKET_CONNECT_X`, extracts socktuple via `EventExtractor`, creates `Connection` object, feeds `ConnTracker`. + +### Process Syscalls (consumed by ProcessSignalHandler) + +| Syscall | Enter Event | Exit Event | Data Captured | Collector Handler | +|---------|-------------|------------|---------------|-------------------| +| `execve` | PPME_SYSCALL_EXECVE_19_E | PPME_SYSCALL_EXECVE_19_X | filename → pid, tid, exe, args, env, cwd, cgroups, caps, ... (27 params) | `ProcessSignalFormatter` → gRPC to Sensor | +| `execveat` | PPME_SYSCALL_EXECVEAT_E | PPME_SYSCALL_EXECVEAT_X | dirfd, pathname, flags → (same as execve) | `ProcessSignalFormatter` | +| `clone` | PPME_SYSCALL_CLONE_E | PPME_SYSCALL_CLONE_X | flags, stack, ptid, ctid → child_tid, clone_flags | Process lineage tracking | +| `clone3` | PPME_SYSCALL_CLONE3_E | PPME_SYSCALL_CLONE3_X | clone_args → child_tid, clone_flags | Process lineage tracking | +| `fork` | PPME_SYSCALL_FORK_E | PPME_SYSCALL_FORK_X | None → child_pid | Process lineage tracking | +| `vfork` | PPME_SYSCALL_VFORK_E | PPME_SYSCALL_VFORK_X | None → child_pid | Process lineage tracking | + +**Special: sched_process_exit tracepoint** + +```c +// falcosecurity-libs/driver/modern_bpf/programs/attached/events/sched_process_exit.bpf.c + +SEC("tp_btf/sched_process_exit") +int BPF_PROG(sched_proc_exit, struct task_struct *task) { + struct auxiliary_map *auxmap = auxmap__get(); + auxmap__preload_event_header(auxmap, PPME_PROCEXIT_1_E); + + // Extract exit status from task_struct + int32_t exit_code = 0; + READ_TASK_FIELD_INTO(&exit_code, task, exit_code); + auxmap__store_s64_param(auxmap, (int64_t)exit_code); + + // Extract return code + int32_t ret = __WEXITSTATUS(exit_code); + auxmap__store_s64_param(auxmap, (int64_t)ret); + + // Extract termination signal (if any) + uint8_t sig = 0; + if (__WIFSIGNALED(exit_code) != 0) { + sig = __WTERMSIG(exit_code); + } + auxmap__store_u8_param(auxmap, sig); + + // Core dump flag + uint8_t core = __WCOREDUMP(exit_code) != 0; + auxmap__store_u8_param(auxmap, core); + + // Find reaper for orphaned children + int32_t reaper_pid = 0; + struct list_head *head = &(task->children); + struct list_head *next_child = BPF_CORE_READ(head, next); + if (next_child != head) { + reaper_pid = find_new_reaper_pid(task); // Complex logic, see below + } + auxmap__store_s64_param(auxmap, (int64_t)reaper_pid); + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); + return 0; +} +``` + +**Reaper logic:** When a process exits, its children are reparented to: +1. Another thread in the same thread group (if alive) +2. A sub-reaper (process with `prctl(PR_SET_CHILD_SUBREAPER)`) +3. PID 1 (init) in the current PID namespace + +This is implemented with `#pragma unroll` loops to satisfy verifier complexity limits (see ROX-31971). + +### File I/O Syscalls + +| Syscall | Data Captured | Use Case | +|---------|---------------|----------| +| `open`, `openat`, `openat2` | path, flags, mode → fd | File access tracking | +| `read`, `readv`, `pread64`, `preadv` | fd, size → bytes_read | Data flow analysis | +| `write`, `writev`, `pwrite64`, `pwritev` | fd, size → bytes_written | Data flow analysis | +| `close` | fd → (fd closure) | Resource cleanup tracking | +| `dup`, `dup2`, `dup3` | oldfd → newfd | FD aliasing | + +### Permission/Security Syscalls + +| Syscall | Data Captured | Use Case | +|---------|---------------|----------| +| `setuid`, `setgid`, `setreuid`, `setregid`, `setresuid`, `setresgid` | uid/gid changes | Privilege escalation detection | +| `capset` | capability changes | Capability tracking | +| `prctl` | operation, args | Process behavior modification | +| `seccomp` | mode, filter | Sandboxing detection | +| `ptrace` | request, pid | Debugging/injection detection | + +### Full Syscall List + +158 syscalls instrumented. See complete list: +```bash +ls falcosecurity-libs/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/ +``` + +**Excluded syscalls** (configured in `CMakeLists.txt`): +```cmake +MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep)$" +``` +These syscalls have handlers but may be disabled to reduce overhead. + +--- + +## Event Format and Ring Buffers + +### Event Header: ppm_evt_hdr + +Every event starts with a fixed header: + +```c +// falcosecurity-libs/driver/ppm_events_public.h + +struct ppm_evt_hdr { + uint64_t ts; // Timestamp (nanoseconds since boot, from bpf_ktime_get_boot_ns()) + uint64_t tid; // Thread ID that triggered the event + uint32_t len; // Total event length including header and parameters + uint16_t type; // ppm_event_code (e.g., PPME_SOCKET_CONNECT_X) + uint32_t nparams; // Number of parameters following header +} __attribute__((packed)); +``` + +### Event Parameters + +After the header, parameters are serialized as type-length-value (TLV): + +**Parameter types** (from `ppm_param_type`): +- `PT_FD`: file descriptor (int32) +- `PT_ERRNO`: error code / return value (int64) +- `PT_SOCKADDR`: socket address (family + IP + port) +- `PT_SOCKTUPLE`: full connection tuple (src IP:port → dst IP:port, protocol) +- `PT_CHARBUF`: null-terminated string +- `PT_CHARBUFARRAY`: array of strings +- `PT_PID`: process/thread ID +- `PT_UID`, `PT_GID`: user/group IDs +- `PT_FLAGS32`: bitmask flags + +**Variable-size encoding:** + +```c +struct auxiliary_map { + uint8_t data[AUXILIARY_MAP_SIZE]; // Raw event bytes (max 128KB) + uint64_t payload_pos; // Current write position + uint8_t lengths_pos; // Parameter count + uint16_t event_type; // PPME_* code +}; +``` + +Each BPF program uses a **per-CPU auxiliary map** to build the event: + +```c +// 1. Initialize header +auxmap__preload_event_header(auxmap, PPME_SOCKET_CONNECT_X); + +// 2. Store parameters +auxmap__store_s64_param(auxmap, return_code); // PT_ERRNO +auxmap__store_socktuple_param(auxmap, fd, OUTBOUND, NULL); // PT_SOCKTUPLE +auxmap__store_s64_param(auxmap, fd); // PT_FD + +// 3. Finalize header (sets len, nparams, ts, tid) +auxmap__finalize_event_header(auxmap); + +// 4. Submit to ring buffer +auxmap__submit_event(auxmap); +``` + +**Why auxiliary maps instead of direct ring buffer writes?** + +Ring buffers require reserving space upfront, but syscall events are variable-size (e.g., execve args can be 64KB). The verifier can't prove bounds for direct writes. Auxiliary maps work around this: +- Per-CPU map (no contention) +- Fixed size (128KB, verifier-friendly) +- Single `bpf_ringbuf_output()` call at the end + +### Ring Buffer Architecture + +**Per-CPU ring buffers** (`BPF_MAP_TYPE_RINGBUF`): + +```c +// falcosecurity-libs/driver/modern_bpf/maps/maps.h (simplified) + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 512 * 1024 * 1024); // 512MB total, divided across CPUs +} ringbuf __weak SEC(".maps"); +``` + +**Configuration:** +- Total size: `sinsp_total_buffer_size_` (default 512MB) +- CPUs per buffer: `sinsp_cpu_per_buffer_` (default 0 = 1:1) +- Example: 128 CPUs, 512MB total, 1:1 → 128 buffers of 4MB each + +**Why per-CPU?** +- No locking (each CPU writes to its own buffer) +- Cache-friendly (data stays local) +- Scalable to 100+ CPUs + +**Overflow handling:** +- If buffer full, `bpf_ringbuf_output()` fails silently +- Userspace detects drops via ring buffer headers +- `nDropsBuffer` stat incremented + +**Userspace polling:** +- libscap uses `epoll()` on ring buffer FDs +- `scap_next()` reads from next available buffer in round-robin + +--- + +## Event Flow: Kernel to Userspace + +Complete path from syscall to collector handler: + +### 1. Kernel Tracepoint Fires + +``` +Application calls connect(fd, &addr, addrlen) + ↓ +Kernel enters sys_connect() + ↓ +Tracepoint: trace_sys_enter(regs, __NR_connect) + ↓ +BPF program attached to tp_btf/sys_enter triggers +``` + +### 2. Dispatcher Filters and Tail Calls + +```c +// falcosecurity-libs/driver/modern_bpf/programs/attached/dispatchers/syscall_enter.bpf.c + +int BPF_PROG(sys_enter, struct pt_regs* regs, long syscall_id) { + // syscall_id = 42 (__NR_connect) + + // Check if interesting + if (!g_64bit_interesting_syscalls_table[42]) return 0; + + // Check sampling + if (sampling_logic_enter(ctx, 42)) return 0; + + // Tail call to connect_e + bpf_tail_call(ctx, &syscall_enter_tail_table, 42); +} +``` + +### 3. Syscall Handler Builds Event + +```c +// connect.bpf.c:connect_e (enter event) + +int BPF_PROG(connect_e, struct pt_regs *regs, long id) { + struct auxiliary_map *auxmap = maps__get_auxiliary_map(cpu_id); + + auxmap->event_type = PPME_SOCKET_CONNECT_E; + auxmap->payload_pos = sizeof(struct ppm_evt_hdr); + + // Extract syscall args from pt_regs + int fd = (int)regs->rdi; // arg 0 + struct sockaddr *addr = regs->rsi; // arg 1 + socklen_t addrlen = regs->rdx; // arg 2 + + // Store parameters + auxmap__store_s64_param(auxmap, fd); + auxmap__store_sockaddr_param(auxmap, addr, addrlen); + + // Fill header: ts, tid, len, nparams + auxmap__finalize_event_header(auxmap); + + // Submit to ring buffer + bpf_ringbuf_output(&ringbuf, auxmap->data, auxmap->payload_pos, 0); +} +``` + +### 4. Kernel Executes Syscall + +``` +connect_e submitted + ↓ +BPF program returns + ↓ +Kernel executes actual sys_connect() logic + ↓ +Kernel returns (success or error) + ↓ +Tracepoint: trace_sys_exit(regs, ret) + ↓ +BPF program attached to tp_btf/sys_exit triggers +``` + +### 5. Exit Handler Captures Result + +```c +// connect.bpf.c:connect_x (exit event) + +int BPF_PROG(connect_x, struct pt_regs *regs, long ret) { + // ret = 0 (success) or -EINPROGRESS or -errno + + struct auxiliary_map *auxmap = auxmap__get(); + auxmap__preload_event_header(auxmap, PPME_SOCKET_CONNECT_X); + + // Store return code + auxmap__store_s64_param(auxmap, ret); + + // On success/EINPROGRESS, extract connection tuple from kernel socket + if (ret == 0 || ret == -EINPROGRESS) { + struct socket *sock = sockfd_lookup(fd); + struct sock *sk = sock->sk; + + // Extract: src IP, src port, dst IP, dst port, protocol + uint32_t saddr = sk->__sk_common.skc_rcv_saddr; + uint16_t sport = sk->__sk_common.skc_num; + uint32_t daddr = sk->__sk_common.skc_daddr; + uint16_t dport = ntohs(sk->__sk_common.skc_dport); + + // Store as socktuple + auxmap__store_socktuple_param(auxmap, fd, OUTBOUND, NULL); + } else { + auxmap__store_empty_param(auxmap); + } + + auxmap__store_s64_param(auxmap, fd); + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); +} +``` + +### 6. Userspace Polls Ring Buffer + +```cpp +// falcosecurity-libs/userspace/libscap/scap.c (simplified) + +int32_t scap_next(scap_t* handle, scap_evt** pevent, uint16_t* pcpuid) { + // Poll ring buffers via epoll + int n = epoll_wait(handle->m_epollfd, events, handle->m_ndevs, timeout); + + for (int i = 0; i < n; i++) { + int cpu = events[i].data.fd; + struct ringbuf_map* rb = &handle->m_devs[cpu]; + + // Read event from ring buffer + struct ppm_evt_hdr* hdr = ringbuf_read(rb); + + *pevent = (scap_evt*)hdr; + *pcpuid = cpu; + return SCAP_SUCCESS; + } +} +``` + +### 7. libsinsp Enriches Event + +```cpp +// falcosecurity-libs/userspace/libsinsp/sinsp.cpp (simplified) + +int32_t sinsp::next(sinsp_evt** evt) { + scap_evt* scap_evt; + uint16_t cpuid; + + // Get raw event from libscap + int32_t res = scap_next(m_h, &scap_evt, &cpuid); + + // Wrap in sinsp_evt (adds thread/FD context) + m_evt.set_scap_evt(scap_evt); + + // Lookup thread info from cache + threadinfo* tinfo = m_thread_manager->find_thread(scap_evt->tid); + m_evt.set_threadinfo(tinfo); + + // For socket events, lookup FD info + if (scap_evt->type == PPME_SOCKET_CONNECT_X) { + sinsp_fdinfo* fdinfo = tinfo->get_fd(fd); + m_evt.set_fdinfo(fdinfo); + } + + // Resolve container metadata (if available) + if (tinfo->m_container_id != "") { + container_info* cinfo = m_container_manager->get_container(tinfo->m_container_id); + m_evt.set_container_info(cinfo); + } + + *evt = &m_evt; + return SCAP_SUCCESS; +} +``` + +### 8. SystemInspector Dispatches to Handlers + +```cpp +// collector/lib/system-inspector/Service.cpp (simplified) + +void SystemInspectorService::Run() { + while (running_) { + sinsp_evt* evt; + int32_t res = inspector_->next(&evt); + + if (res == SCAP_SUCCESS) { + stats_.nEvents++; + + // Dispatch to registered handlers + for (auto& handler : signal_handlers_) { + if (handler->IsInterested(evt->get_type())) { + handler->HandleSignal(evt); + } + } + } + } +} +``` + +### 9. NetworkSignalHandler Processes Event + +```cpp +// collector/lib/NetworkSignalHandler.cpp (simplified) + +SignalHandler::Result NetworkSignalHandler::HandleSignal(sinsp_evt* evt) { + if (evt->get_type() != PPME_SOCKET_CONNECT_X) { + return SignalHandler::IGNORED; + } + + // Extract connection tuple + std::optional conn = GetConnection(evt); + if (!conn) return SignalHandler::IGNORED; + + // Feed to connection tracker + conn_tracker_->UpdateConnection(*conn); + + return SignalHandler::SUCCESS; +} + +std::optional NetworkSignalHandler::GetConnection(sinsp_evt* evt) { + // Use EventExtractor to safely access event parameters + auto tuple = event_extractor_->get_socktuple(evt); + auto container_id = event_extractor_->get_container_id(evt); + auto pid = event_extractor_->get_tid(evt); + + return Connection{ + .tuple = tuple, + .container_id = container_id, + .pid = pid, + .timestamp = evt->get_ts() + }; +} +``` + +### 10. ConnTracker Aggregates + +```cpp +// collector/lib/ConnTracker.cpp (simplified) + +void ConnTracker::UpdateConnection(const Connection& conn) { + std::lock_guard lock(mutex_); + + auto key = MakeKey(conn.tuple); + auto& agg = connections_[key]; + + agg.bytes_sent += conn.bytes_sent; + agg.bytes_recv += conn.bytes_recv; + agg.last_seen = conn.timestamp; + + // Periodically flush to Sensor via gRPC + if (ShouldFlush(agg)) { + SendToSensor(agg); + } +} +``` + +**Full flow diagram:** + +``` +Application syscall (connect) + ↓ +tp_btf/sys_enter tracepoint → sys_enter.bpf.c dispatcher + ↓ +bpf_tail_call → connect_e handler (PPME_SOCKET_CONNECT_E) + ↓ +Event → auxiliary_map → ring buffer + ↓ +Kernel executes sys_connect() + ↓ +tp_btf/sys_exit tracepoint → sys_exit.bpf.c dispatcher + ↓ +bpf_tail_call → connect_x handler (PPME_SOCKET_CONNECT_X) + ↓ +Event → auxiliary_map → ring buffer + ↓ +libscap: scap_next() polls ring buffer → scap_evt + ↓ +libsinsp: sinsp::next() enriches → threadinfo, fdinfo, container_info + ↓ +SystemInspectorService: dispatches to signal handlers + ↓ +NetworkSignalHandler: extracts Connection + ↓ +ConnTracker: aggregates, sends to Sensor via gRPC +``` + +--- + +## Adding New Syscall Monitoring + +Step-by-step guide to add monitoring for a new syscall (example: `openat`). + +### 1. Verify Driver Support + +Check if falcosecurity-libs already instruments the syscall: + +```bash +ls falcosecurity-libs/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/ | grep openat +``` + +If `openat.bpf.c` exists, the driver already captures it. Skip to step 5 (enable in collector config). + +If missing, you'll need to add it to falcosecurity-libs (upstream contribution or StackRox fork). + +### 2. Add BPF Handler (in falcosecurity-libs) + +Create `falcosecurity-libs/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/openat.bpf.c`: + +```c +// SPDX-License-Identifier: GPL-2.0-only OR MIT +#include + +/*=============================== ENTER EVENT ===========================*/ + +SEC("tp_btf/sys_enter") +int BPF_PROG(openat_e, struct pt_regs *regs, long id) { + struct auxiliary_map *auxmap = auxmap__get(); + if (!auxmap) return 0; + + auxmap__preload_event_header(auxmap, PPME_SYSCALL_OPENAT_E); + + // Extract syscall arguments + // openat(int dirfd, const char *pathname, int flags, mode_t mode) + int32_t dirfd = (int32_t)extract__syscall_argument(regs, 0); + unsigned long pathname_ptr = extract__syscall_argument(regs, 1); + int32_t flags = (int32_t)extract__syscall_argument(regs, 2); + uint32_t mode = (uint32_t)extract__syscall_argument(regs, 3); + + // Store parameters + auxmap__store_s64_param(auxmap, dirfd); + auxmap__store_charbuf_param(auxmap, pathname_ptr, MAX_PATH, USER); + auxmap__store_u32_param(auxmap, flags); + auxmap__store_u32_param(auxmap, mode); + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); + return 0; +} + +/*=============================== EXIT EVENT ===========================*/ + +SEC("tp_btf/sys_exit") +int BPF_PROG(openat_x, struct pt_regs *regs, long ret) { + struct auxiliary_map *auxmap = auxmap__get(); + if (!auxmap) return 0; + + auxmap__preload_event_header(auxmap, PPME_SYSCALL_OPENAT_X); + + // ret = fd (>= 0) or -errno + auxmap__store_s64_param(auxmap, ret); + + // Re-extract arguments (not preserved across syscall boundary) + int32_t dirfd = (int32_t)extract__syscall_argument(regs, 0); + unsigned long pathname_ptr = extract__syscall_argument(regs, 1); + int32_t flags = (int32_t)extract__syscall_argument(regs, 2); + uint32_t mode = (uint32_t)extract__syscall_argument(regs, 3); + + auxmap__store_s64_param(auxmap, dirfd); + auxmap__store_charbuf_param(auxmap, pathname_ptr, MAX_PATH, USER); + auxmap__store_u32_param(auxmap, flags); + auxmap__store_u32_param(auxmap, mode); + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); + return 0; +} +``` + +### 3. Define Event Codes (in falcosecurity-libs) + +Add to `falcosecurity-libs/driver/event_table.c`: + +```c +[PPME_SYSCALL_OPENAT_E] = {"openat", EC_SYSCALL | EC_FILE, EF_NONE, 4, { + {"dirfd", PT_FD, PF_DEC}, + {"pathname", PT_FSPATH, PF_NA}, + {"flags", PT_FLAGS32, PF_HEX}, + {"mode", PT_UINT32, PF_OCT} +}}, +[PPME_SYSCALL_OPENAT_X] = {"openat", EC_SYSCALL | EC_FILE, EF_NONE, 5, { + {"res", PT_FD, PF_DEC}, + {"dirfd", PT_FD, PF_DEC}, + {"pathname", PT_FSPATH, PF_NA}, + {"flags", PT_FLAGS32, PF_HEX}, + {"mode", PT_UINT32, PF_OCT} +}}, +``` + +Update `falcosecurity-libs/driver/ppm_events_public.h`: + +```c +enum ppm_event_code { + // ... existing events ... + PPME_SYSCALL_OPENAT_E = 300, + PPME_SYSCALL_OPENAT_X = 301, + // ... +}; +``` + +### 4. Register Tail Calls (in falcosecurity-libs) + +Tail calls are auto-registered by libbpf based on section names. No manual registration needed if you use `SEC("tp_btf/sys_enter")` / `SEC("tp_btf/sys_exit")`. + +### 5. Enable in Collector Config + +Add to `collector/container/config/collection-method.yaml`: + +```yaml +syscalls: + - connect + - accept + - close + - execve + - openat # <-- Add here +``` + +Or configure via environment variable: +```bash +COLLECTION_METHOD="syscalls=connect,accept,close,execve,openat" +``` + +### 6. Add Collector Handler (if needed) + +If you need custom processing (beyond generic event capture): + +**Create handler:** + +```cpp +// collector/lib/FileAccessHandler.h + +class FileAccessHandler : public SignalHandler { +public: + std::string GetName() override { return "FileAccessHandler"; } + + Result HandleSignal(sinsp_evt* evt) override { + if (evt->get_type() == PPME_SYSCALL_OPENAT_X) { + return HandleOpenat(evt); + } + return IGNORED; + } + + std::vector GetRelevantEvents() override { + return {"openat"}; + } + +private: + Result HandleOpenat(sinsp_evt* evt) { + int64_t ret = evt->get_param(0)->as(); // return code + if (ret < 0) return IGNORED; // failed open + + int fd = ret; + std::string path = evt->get_param(2)->as(); // pathname + uint32_t flags = evt->get_param(3)->as(); + + // Process file access (e.g., track writes to sensitive paths) + if (flags & O_WRONLY || flags & O_RDWR) { + LogFileWrite(path, fd); + } + + return SUCCESS; + } +}; +``` + +**Register in SystemInspector:** + +```cpp +// collector/lib/CollectorService.cpp + +void CollectorService::InitSystemInspector() { + // ... existing handlers ... + + auto file_handler = std::make_unique(); + system_inspector_->AddSignalHandler(std::move(file_handler)); +} +``` + +### 7. Test + +**Build collector:** +```bash +make image +``` + +**Deploy and verify:** +```bash +kubectl logs -n stackrox collector-xxxxx | grep openat +``` + +**Check stats:** +```bash +curl localhost:8080/metrics | grep openat +``` + +Should see: +``` +rox_collector_event_times_us_total{event_type="openat",event_dir=">"} 1234 +rox_collector_event_times_us_total{event_type="openat",event_dir="<"} 5678 +``` + +--- + +## Verifier Constraints and Pitfalls + +BPF verifier enforces strict safety guarantees. Understanding its limitations prevents hours of debugging. + +### Instruction Limit + +**Limit:** 1 million instructions per program (kernel >= 5.2). + +**Symptom:** +``` +libbpf: load bpf program failed: Invalid argument +libbpf: -- BEGIN DUMP LOG -- +libbpf: processed 1000001 insns (limit 1000000) +libbpf: -- END DUMP LOG -- +``` + +**Solution:** Tail calls. + +**Example:** execve exit handler exceeded limit collecting 27 parameters. Split into 3 programs: + +```c +SEC("tp_btf/sys_exit") +int BPF_PROG(execve_x, struct pt_regs *regs, long ret) { + // Collect params 1-14 + auxmap__store_s64_param(auxmap, ret); + // ... 13 more params ... + + // Tail to continuation + bpf_tail_call(ctx, &extra_syscall_calls, T1_EXECVE_X); + return 0; +} + +SEC("tp_btf/sys_exit") +int BPF_PROG(t1_execve_x, struct pt_regs *regs, long ret) { + // Collect params 15-20 + // ... 6 params ... + + bpf_tail_call(ctx, &extra_syscall_calls, T2_EXECVE_X); + return 0; +} + +SEC("tp_btf/sys_exit") +int BPF_PROG(t2_execve_x, struct pt_regs *regs, long ret) { + // Collect params 21-27 + // ... 7 params ... + + auxmap__finalize_event_header(auxmap); + auxmap__submit_event(auxmap); + return 0; +} +``` + +### Loop Bounds + +**Limit:** Verifier must prove loops terminate. Unbounded loops rejected. + +**Symptom:** +``` +libbpf: back-edge from insn 342 to 256 +``` + +**Solution:** Bounded loops with `#pragma unroll` or explicit counters. + +**Bad:** +```c +for (struct task_struct *p = task->parent; p != NULL; p = p->parent) { + // Verifier can't prove termination +} +``` + +**Good:** +```c +#pragma unroll +for (struct task_struct *p = task->parent; cnt < MAX_DEPTH; p = p->parent) { + cnt++; + if (p == NULL) break; + // Process p +} +``` + +**ROX-31971:** Some verifiers fail even with `#pragma unroll`, looping infinitely during verification. Workaround: reduce `MAX_DEPTH` or restructure logic. + +### Stack Size + +**Limit:** 512 bytes (kernel < 5.2) or 8KB (kernel >= 5.2). + +**Symptom:** +``` +libbpf: combined stack size of 4 programs is 9216 bytes +``` + +**Solution:** Use per-CPU maps for large buffers. + +**Bad:** +```c +char buf[4096]; +bpf_probe_read_user(buf, sizeof(buf), ptr); +``` + +**Good:** +```c +struct auxiliary_map *auxmap = auxmap__get(); // Per-CPU map +bpf_probe_read_user(auxmap->data, MAX_SIZE, ptr); +``` + +### Helper Call Verification + +**Issue:** Verifier tracks pointer state. After certain operations, it may "forget" a pointer is valid. + +**Symptom:** +``` +R1 invalid mem access 'inv' +``` + +**Example:** Ring buffer pointer invalidated after complex logic. + +**Solution (ROX-31971):** Use auxiliary map approach instead of direct ring buffer reserve: + +```c +// Instead of: +void *data = bpf_ringbuf_reserve(&ringbuf, size, 0); +// ... complex logic ... +bpf_ringbuf_submit(data, 0); // Verifier may reject + +// Use: +struct auxiliary_map *auxmap = auxmap__get(); +// ... complex logic ... +bpf_ringbuf_output(&ringbuf, auxmap->data, auxmap->payload_pos, 0); +``` + +### CO-RE Relocations + +**Issue:** Field offsets differ across kernel versions. CO-RE handles this, but requires BTF. + +**Symptom:** +``` +libbpf: failed to find BTF for extern 'task_struct' [16] section +``` + +**Solution:** Ensure BTF available (`/sys/kernel/btf/vmlinux`). Use `bpf_core_field_exists()` for optional fields: + +```c +if (bpf_core_field_exists(inode->i_ctime)) { + BPF_CORE_READ_INTO(&time, inode, i_ctime); +} else { + // Kernel 6.6+ moved to __i_ctime + struct inode___v6_6 *inode_v6_6 = (void *)inode; + BPF_CORE_READ_INTO(&time, inode_v6_6, __i_ctime); +} +``` + +### Complexity Limits + +**Issue:** Verifier complexity analysis can fail even if instruction count OK. + +**Symptom:** +``` +libbpf: the BPF verifier is unhappy: verifier log exceeds buffer size +``` + +**ROX-24938:** Container-Optimized OS (COS) verifier rejected reaper logic. Solution: reduce `MAX_HIERARCHY_TRAVERSE` from 128 to 60. + +**ROX-31971:** Clang > 19 generates different code patterns, hitting new verifier limits. Solution: adjust tail call boundaries, simplify conditional logic. + +### Common Pitfalls + +**1. Reading user pointers without bounds:** +```c +// Bad: +char *user_str = (char *)extract__syscall_argument(regs, 0); +while (*user_str) { ... } // Unbounded + +// Good: +unsigned long user_str = extract__syscall_argument(regs, 0); +auxmap__store_charbuf_param(auxmap, user_str, MAX_PATH, USER); // Bounded +``` + +**2. Forgetting NULL checks:** +```c +// Bad: +struct task_struct *parent = BPF_CORE_READ(task, parent); +pid_t ppid = BPF_CORE_READ(parent, pid); // May crash if parent == NULL + +// Good: +struct task_struct *parent = BPF_CORE_READ(task, parent); +if (parent) { + pid_t ppid = BPF_CORE_READ(parent, pid); +} +``` + +**3. Mixing kernel and user pointers:** +```c +// Bad: +unsigned long ptr = extract__syscall_argument(regs, 0); +struct foo *f = (struct foo *)ptr; +int val = f->field; // WRONG: ptr is userspace + +// Good: +unsigned long ptr = extract__syscall_argument(regs, 0); +struct foo f; +bpf_probe_read_user(&f, sizeof(f), (void *)ptr); +int val = f.field; +``` + +**4. Large inline functions:** + +Inlining can explode instruction count. Use `__noinline` for large helpers (requires kernel >= 5.8): + +```c +__noinline static int parse_complex_struct(...) { + // ... 500 instructions ... +} +``` + +### Debugging Verifier Failures + +**1. Enable verifier log:** +```bash +echo 1 > /proc/sys/kernel/bpf_stats_enabled +``` + +**2. Check dmesg for full log:** +```bash +dmesg | grep bpf +``` + +**3. Use bpftool to inspect loaded programs:** +```bash +bpftool prog show +bpftool prog dump xlated id # Show translated instructions +bpftool prog dump jited id # Show JIT assembly +``` + +**4. Simplify incrementally:** + +Comment out sections until verifier passes, then re-enable to isolate issue. + +**5. Check kernel version:** + +Older kernels have stricter verifier. Collector requires >= 5.8 for CO-RE, but >= 5.13 recommended for better verifier. + +--- + +## Summary + +**CO-RE BPF driver architecture:** +- **Attached programs** on `tp_btf/sys_enter`, `tp_btf/sys_exit`, scheduler tracepoints +- **Tail call dispatch** via `BPF_MAP_TYPE_PROG_ARRAY` indexed by syscall ID +- **158 syscall handlers** capturing network, process, file, permission events +- **Variable-size events** built in per-CPU auxiliary maps, submitted to ring buffers +- **ppm_evt_hdr** format with TLV parameters consumed by libscap/libsinsp +- **Full event flow** from kernel tracepoint → BPF → ring buffer → libscap → libsinsp → SystemInspector → NetworkSignalHandler/ProcessSignalHandler → ConnTracker + +**Key takeaways:** +- Tail calls are essential for complex syscall monitoring (work around 1M instruction limit) +- Auxiliary maps solve variable-size event challenges with verifier +- CO-RE relocations enable single binary across kernel versions +- Verifier limits require careful loop bounds, stack usage, pointer tracking +- NetworkSignalHandler and ProcessSignalHandler bridge kernel events to collector's connection/process tracking + +**Related documentation:** +- [falcosecurity-libs.md](./falcosecurity-libs.md) - High-level overview and integration +- [lib/system.md](./lib/system.md) - SystemInspector abstraction boundary +- [ROX-31971](https://issues.redhat.com/browse/ROX-31971) - Verifier complexity fixes +- [ROX-24938](https://issues.redhat.com/browse/ROX-24938) - COS verifier workarounds +- [ROX-18856](https://issues.redhat.com/browse/ROX-18856) - getsockopt for async connect status + +**Source tree:** +- `falcosecurity-libs/driver/modern_bpf/programs/attached/` - Tracepoint entry points +- `falcosecurity-libs/driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/` - Per-syscall handlers +- `falcosecurity-libs/driver/modern_bpf/maps/maps.h` - Tail tables and global variables +- `falcosecurity-libs/driver/ppm_events_public.h` - Event codes and header format +- `falcosecurity-libs/userspace/libscap/` - Ring buffer polling and event parsing +- `falcosecurity-libs/userspace/libsinsp/` - Event enrichment and container metadata +- `collector/lib/NetworkSignalHandler.cpp` - Network event consumption +- `collector/lib/ProcessSignalHandler.cpp` - Process event consumption diff --git a/docs/falcosecurity-libs.md b/docs/falcosecurity-libs.md new file mode 100644 index 0000000000..af1ca0071e --- /dev/null +++ b/docs/falcosecurity-libs.md @@ -0,0 +1,149 @@ +# falcosecurity-libs + +## Overview + +The falcosecurity-libs submodule provides kernel instrumentation via CO-RE BPF, enabling syscall tracing and event capture. This is a StackRox fork of upstream Falco libs, customized for network flow tracking and process monitoring. + +**Version:** 3.21.6 +**Driver:** CO-RE BPF only (kernel >= 5.8 with BTF) +**Location:** `falcosecurity-libs/` submodule + +## Architecture Layers + +Collector consumes falcosecurity-libs through three layers: + +**libsinsp** (`userspace/libsinsp/`) +Event enrichment, filtering, and state tracking. Provides `sinsp` inspector class consumed by `collector/lib/system-inspector/Service.cpp:SystemInspectorService`. Maintains thread table, FD table, container metadata, and filter engine. + +**libscap** (`userspace/libscap/`) +Ring buffer management and event parsing. Reads from per-CPU BPF ring buffers, parses `ppm_evt_hdr` structures, extracts syscall parameters. Used via `scap_open()`, `scap_next()` in libsinsp. + +**Modern BPF Driver** (`driver/modern_bpf/`) +CO-RE BPF programs attached to tracepoints. Built as skeleton header `bpf_probe.skel.h` embedded in collector binary. Loaded by `collector/lib/KernelDriver.h:ModernBPFDriver`. + +## Syscalls Instrumented + +Network events consumed by `collector/lib/NetworkSignalHandler.cpp`: +- connect, accept, accept4, bind, listen +- sendto, recvfrom, sendmsg, recvmsg +- socket, socketpair, shutdown, close + +Process events consumed by `collector/lib/ProcessSignalFormatter.cpp`: +- execve, execveat (execution) +- clone, clone3, fork, vfork (creation) +- exit (via sched_process_exit tracepoint) + +File I/O: open, openat, read, write, close, dup +Memory: mmap, munmap, mprotect, brk +Permissions: setuid, setgid, capset + +Modern BPF excludes certain syscalls (configured in `CMakeLists.txt`): +```cmake +MODERN_BPF_EXCLUDE_PROGS "^(openat2|ppoll|setsockopt|io_uring_setup|nanosleep)$" +``` + +## Event Flow + +1. Kernel tracepoints trigger BPF programs (sys_enter/sys_exit, sched_process_exit) +2. BPF programs write `ppm_evt_hdr` structures to per-CPU ring buffers +3. libscap polls ring buffers, parses events via `scap_next()` +4. libsinsp enriches events with container metadata, thread info, FD state +5. sinsp filter engine applies event filters +6. SystemInspectorService dispatches to NetworkSignalHandler or ProcessSignalFormatter +7. Handlers feed ConnTracker or send gRPC signals to Sensor + +## Container Metadata + +libsinsp resolves container IDs to Kubernetes pod/namespace via container engine APIs: + +**Engines** (`userspace/libsinsp/container_engine/`): +- Docker +- containerd (CRI) +- CRI-O +- Podman + +**Metadata extracted:** +- Container ID, pod name/namespace +- Image name and digest +- Labels, annotations +- Network namespaces + +Used by `collector/lib/ContainerMetadata.cpp` wrapping `sinsp::get_container_manager()`. + +## StackRox Configuration + +From `collector/CMakeLists.txt`: + +```cmake +set(BUILD_LIBSCAP_MODERN_BPF ON) +set(BUILD_DRIVER OFF) # No kernel module +set(SINSP_SLIM_THREADINFO ON) # Reduce memory +set(MODERN_BPF_DEBUG_MODE ${BPF_DEBUG_MODE}) + +add_definitions(-DSCAP_SOCKET_ONLY_FD) +add_definitions("-DINTERESTING_SUBSYS=\"perf_event\", \"cpu\", \"cpuset\", \"memory\"") +set(SCAP_HOST_ROOT_ENV_VAR_NAME "COLLECTOR_HOST_ROOT") +``` + +Tunables in `collector/lib/CollectorConfig.h`: +- `sinsp_cpu_per_buffer_`: CPUs per ring buffer (default: 0 = 1:1) +- `sinsp_total_buffer_size_`: Total ring buffer size (default: 512 MB) +- `sinsp_thread_cache_size_`: Thread cache size (default: 32768) + +## StackRox Fork Differences + +ROX-31971 addressed BPF verifier issues with clang > 19. Multiple commits adjusted exec/fork tail call distribution to satisfy verifier complexity limits. ROX-24938 fixed Container-Optimized OS verifier failures. ROX-18856 enabled getsockopt syscall for async connection status tracking. + +Kernel compatibility fixes for 6.15, 6.7, RHEL 9.3. PowerPC support added via ppc64le architecture commits. UDP connectionless syscall tracking improved for network observability. + +**Fork repository:** github.com/stackrox/falcosecurity-libs (branch: module-version-2.10) +**Upstream:** github.com/falcosecurity/libs + +Merge strategy: StackRox maintains versioned branches (0.17.3-stackrox, 0.18.1-stackrox, 0.21.0-stackrox, 0.23.1-stackrox), periodically rebases on upstream releases, applies custom patches. Update workflow requires testing across supported kernel versions, verifying BPF verifier compatibility, checking API/schema compatibility. See [falco-update.md](falco-update.md) for the rebase process. + +## Collector Integration + +**SystemInspectorService** (`collector/lib/system-inspector/Service.cpp`) +Creates sinsp instance, opens modern BPF engine via `inspector->open_modern_bpf()`, registers signal handlers, runs event loop with `inspector->next()`. + +**NetworkSignalHandler** (`collector/lib/NetworkSignalHandler.cpp`) +Consumes network events, extracts connection tuples (src IP:port → dst IP:port), feeds ConnTracker for aggregation. Uses `system_inspector::EventExtractor` wrapper around libsinsp APIs. + +**ProcessSignalHandler** (`collector/lib/ProcessSignalFormatter.cpp`) +Consumes process events, builds lineage (parent → child), extracts cmdline/env, sends to Sensor via gRPC. + +**ContainerMetadata** (`collector/lib/ContainerMetadata.cpp`) +Wraps `sinsp::get_container_manager()`, resolves container IDs to K8s pod/namespace. + +**KernelDriver** (`collector/lib/KernelDriver.h`) +ModernBPFDriver class manages probe lifecycle, references `g_syscall_table` from libscap, handles loading/teardown. + +## Debugging + +Enable BPF debug mode: +```cmake +set(MODERN_BPF_DEBUG_MODE ON) +``` +Logs to `/sys/kernel/debug/tracing/trace_pipe` (requires CONFIG_DEBUG_FS). + +Capture file replay: libscap records events to `.scap` files for offline reproduction. + +Verifier failures: check kernel version, review `/var/log/kern.log`, adjust tail call distribution or program complexity, may require clang/LLVM version changes. + +## Performance + +Ring buffer sizing: default 512 MB total. Too small causes event drops, too large causes memory pressure. Per-CPU buffers sized via `sinsp_cpu_per_buffer_`. + +Event filtering: `SCAP_SOCKET_ONLY_FD` processes socket FDs only, reducing userspace load. Limited cgroup subsystems tracked. + +Slim threadinfo: `SINSP_SLIM_THREADINFO` reduces memory ~60% by omitting env vars and full FD snapshots. Trade-off: less detail in process signals. + +## Security + +Kernel instrumentation requires CAP_SYS_ADMIN or CAP_BPF/CAP_PERFMON (kernel >= 5.8). Syscall arguments may contain sensitive data. BPF program loading restricted to collector pod. Ring buffer access requires privileges. + +## History + +ROX-7482 (2022-01-31) migrated from deprecated sysdig to falcosecurity-libs, enabling CO-RE modern BPF. Version updates through 2022-2023 (0.21.0, 0.23.1, 2.11, 0.18.1). Kernel compatibility fixes for Linux 6.7, 9.4, RHEL 9.3, 6.11.4. Clang/verifier improvements 2024-2026 for modern toolchains. + +ROX-32740 explores migrating network flow collection to OpenShift Network Observability operator, potentially deprecating networking-specific Falco probes. diff --git a/docs/integration-tests.md b/docs/integration-tests.md new file mode 100644 index 0000000000..a9703ff3a9 --- /dev/null +++ b/docs/integration-tests.md @@ -0,0 +1,319 @@ +# Integration Tests + +## Overview + +Go-based test framework validating collector across platforms, kernels, and container runtimes. Tests simulate workloads with mock sensor (gRPC server) to verify process, network, and endpoint data capture. + +**Location:** `integration-tests/` +**Framework:** testify/suite (26 test suites) +**Runtimes:** Docker, Podman, CRI-O, containerd + +## Test Flow + +1. Start mock sensor (gRPC on port 9999) +2. Launch collector container +3. Wait for health check +4. Run workload containers +5. Execute commands, generate events +6. Collect events from mock sensor +7. Verify expected events +8. Collect stats, teardown + +## Core Test Suites + +### Process and Execution + +**ProcessNetworkTestSuite** (`process_network.go`) +Nginx container with curl client. Verifies process events (ls, nginx, sh, sleep), process lineage (awk→grep→sleep with parent info), network connections (server-side nginx, client-side curl TCP flows). + +**SymbolicLinkProcessTestSuite** (`symlink_process.go`) +Process detection when executable invoked via symlink. Verifies name/path resolution. + +**ThreadsTestSuite** (`threads.go`) +Multi-threaded process detection. Thread creation/termination events. + +### Network + +**ConnectionsAndEndpointsTestSuite** (`connections_and_endpoints.go`) +Normal ports (40, ephemeral), high/low mixing (40000, 10000), ephemeral server (60999), UDP with socket options (reuseaddr, fork). Verifies TCP/UDP tracking, endpoint detection, client/server roles, close timestamps. + +**RepeatedNetworkFlowTestSuite** (`repeated_network_flow.go`) +Afterglow period testing. With afterglow (10s): connection deduplication. Zero afterglow: all connections separate. No afterglow flag: no coalescing. Parameters: AfterglowPeriod, ScrapeInterval, NumIter, SleepBetweenCurlTime. + +**UdpNetworkFlow** (`udp_networkflow.go`) +UDP connection tracking. Skipped on fedora-coreos, rhel-8, rhcos, rhel-sap, s390x, ppc64le (ROX-27673). + +**AsyncConnectionTestSuite** (`async_connections.go`) +Async connection tracking. Blocked connections (not reported by default), successful connections (always reported), connection status tracking disabled (all reported). + +**SocatTestSuite** (`socat.go`) +Complex socket configurations using socat. + +**DuplicateEndpointsTestSuite** (`duplicate_endpoints.go`) +Duplicate endpoint deduplication logic. + +### Procfs and Scraping + +**ProcfsScraperTestSuite** (`procfs_scraper.go`) +Scrape enabled: detects nginx on port 80 before collector started. Scrape disabled: no pre-existing endpoints. Feature flag disabled: endpoint detected but no originator process info. Verifies procfs scraper detects endpoints opened before collector, populates originator (process name, path, args). + +**MissingProcScrapeTestSuite** (`missing_proc_scrape.go`) +Behavior when /proc entries missing/inaccessible. Local fake proc only (not K8s). + +**ProcessListeningOnPortTestSuite** (`listening_ports.go`) +Endpoint detection for listening sockets, process association. + +### Configuration and Runtime + +**RuntimeConfigFileTestSuite** (`runtime_config_file.go`) +Runtime configuration reload without restart. Configuration changes detected via inotify, settings applied without restart (afterglow, scrape interval, network connection config). Flow: start collector with initial config, modify runtime config file, verify new settings, test network behavior. + +**CollectorStartupTestSuite** (`collector_startup.go`) +Collector initialization. Health check endpoint available, self-check process executes. + +**LogLevelTestSuite** (`log_level_endpoint.go`) +Dynamic log level via `/state/log-level` endpoint. + +### Performance and Profiling + +**BenchmarkCollectorTestSuite / BenchmarkBaselineTestSuite** (`benchmark.go`) +Measures overhead under load. Workloads: berserker/processes (short-lived processes), berserker/endpoints (many connections). Baseline: workload without collector. Collector: workload with collector. Metrics: CPU/memory (mean, stddev), duration. + +Performance tools: +- COLLECTOR_PERF_COMMAND: Linux perf (e.g., `record -o /tmp/perf.data`) +- COLLECTOR_BPFTRACE_COMMAND: BPFtrace scripts (e.g., `/tools/collector-syscalls-count.bt`) +- COLLECTOR_BCC_COMMAND: BCC tools (e.g., `syscount --latency`) + +**GperftoolsTestSuite** (`gperftools.go`) +Gperftools heap profiling (x86_64 only). CPU/heap profiler endpoints. + +**PerfEventOpenTestSuite** (`perf_event_open.go`) +perf_event_open syscall handling, kernel perf event compatibility. + +**PrometheusTestSuite** (`prometheus.go`) +Prometheus `/metrics` endpoint, collector statistics. + +### API and Introspection + +**HttpEndpointAvailabilityTestSuite** (`http_endpoint_availability.go`) +HTTP introspection API: `/ready`, `/state/network/connection`, `/state/network/endpoint`, `/state/log-level`. Requires ROX_COLLECTOR_INTROSPECTION_ENABLE=true. + +**ImageLabelJSONTestSuite** (`image_json.go`) +Container image metadata extraction, Docker image labels/JSON parsing. + +### Edge Cases + +**ProcessesAndEndpointsTestSuite** (`processes_and_endpoints.go`) +Process/endpoint event correlation. + +**RingBufferTestSuite** (`ringbuf.go`) +CO-RE BPF ring buffer functionality, event delivery. + +### Kubernetes + +**Location:** `suites/k8s/` +**Build tag:** `k8s` + +**K8sNamespaceTestSuite** (`namespace.go`) +Kubernetes namespace isolation, multi-container pod visibility, service mesh scenarios. + +**K8sConfigReloadTestSuite** (`config_reload.go`) +Configuration reload via ConfigMap updates, hot reload in K8s. + +Running K8s tests: +```bash +cd integration-tests +CGO_ENABLED=0 GOOS=linux go test -tags k8s -c -o bin/collector-tests + +cd ../ansible +ansible-playbook -i -e '@k8s-tests.yml' k8s-integration-tests.yml +``` + +## Environment Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| REMOTE_HOST_TYPE | Where tests run | local | +| VM_CONFIG | VM type/image (e.g., ubuntu.ubuntu-20.04) | - | +| COLLECTION_METHOD | Probe type | ebpf | +| COLLECTOR_IMAGE | Image to test | - | +| COLLECTOR_OFFLINE_MODE | Allow kernel object downloads | false | +| COLLECTOR_LOG_LEVEL | Log verbosity | debug | +| STOP_TIMEOUT | Container stop timeout (seconds) | 10 | +| COLLECTOR_PERF_COMMAND | Linux perf arguments | - | +| COLLECTOR_BPFTRACE_COMMAND | BPFtrace script/args | - | +| COLLECTOR_BCC_COMMAND | BCC tool/args | - | + +Remote host: +- REMOTE_HOST_USER: SSH username +- REMOTE_HOST_ADDRESS: SSH host or GCP instance +- REMOTE_HOST_OPTIONS: SSH key path or GCP options + +## VM Configurations + +| VM Type | Example VM_CONFIG | +|---------|-------------------| +| cos | cos.cos-stable | +| rhel | rhel.rhel-8 | +| suse | suse.sles-15 | +| ubuntu-os | ubuntu-os.ubuntu-2204-lts | +| flatcar | flatcar.flatcar-stable | +| fedora-coreos | fedora-coreos.fedora-coreos-stable | +| garden-linux | garden-linux.garden-linux | + +## Running Tests + +Local development: +```bash +cd integration-tests +make TestProcessNetwork # Single suite +make ci-integration-tests # All tests +make ci-benchmarks # Benchmarks +``` + +Dockerized: +```bash +make build-image # Build test image +make ci-integration-tests-dockerized # Run in container +make TestProcessNetwork-dockerized # Specific test +``` + +Remote VM: +```bash +cd ../ansible +VM_TYPE=rhel ansible-playbook -i dev integration-tests.yml + +REMOTE_HOST_TYPE=gcloud \ +VM_CONFIG=ubuntu.ubuntu-20.04 \ +COLLECTOR_TEST=TestProcessNetwork \ +ansible-playbook -i dev integration-tests.yml --tags run-tests +``` + +SSH remote: +```bash +REMOTE_HOST_TYPE=ssh \ +REMOTE_HOST_USER=ec2-user \ +REMOTE_HOST_ADDRESS=10.0.1.50 \ +REMOTE_HOST_OPTIONS="-i ~/.ssh/my-key.pem" \ +make TestProcessNetwork +``` + +## Mock Sensor + +Simulates StackRox Sensor component. Located in `pkg/mock_sensor/`, port 9999, gRPC protocol, BoltDB in-memory storage. + +API methods: +- PushSignals(): Receives process/network/endpoint events +- GetProcesses(): Query process events +- GetConnections(): Query connections +- GetEndpoints(): Query listening endpoints +- GetLineageInfo(): Query parent/child relationships + +Verification helpers in test suites: +- ExpectProcesses(t, containerID, timeout, ProcessInfo{...}) +- ExpectConnections(t, containerID, timeout, NetworkConnection{...}) +- ExpectEndpoints(t, containerID, timeout, EndpointInfo{...}) +- ExpectLineages(t, containerID, timeout, "bash", ProcessLineage{...}) + +## Test Infrastructure + +Base suite: `IntegrationTestSuiteBase` provides lifecycle (StartCollector, StopCollector, RegisterCleanup), utilities (Executor, Collector, Sensor, execContainer, waitForContainerToExit, getIPAddress, getPorts), performance (StartContainerStats, SnapshotContainerStats, PrintContainerStats, WritePerfResults). + +Container executor abstraction supports Docker, CRI-O, containerd via `Executor` interface (StartContainer, StopContainer, ExecContainer, GetContainerLogs, GetContainerStats, PullImage). Implementations: DockerExecutor (Docker CLI), CRIExecutor (CRI-O/containerd gRPC). + +## Artifacts + +Logs saved to `container-logs//`: +``` +container-logs/ +├── TestProcessNetwork/ +│ ├── collector.log +│ ├── grpc-server.log +│ ├── nginx.log +│ └── nginx-curl.log +``` + +Performance data in `perf.json`: +```json +{ + "TestName": "TestBenchmarkCollector", + "VmConfig": "ubuntu.ubuntu-20.04", + "CollectionMethod": "ebpf", + "Metrics": { + "collector_cpu_mean": 12.5, + "collector_mem_mean": 85.2 + } +} +``` + +JUnit reports: `make report` generates `integration-test-report.xml`. + +## Troubleshooting + +Test timeout: check collector logs, verify mock sensor received events, increase timeout. + +Container fails: check image availability, review logs, verify kernel compatibility (eBPF >= 4.14). + +Network events not captured: verify collection method, check probe loaded (lsmod/bpftool), review procfs scraper config. + +Performance instability: increase berserker timeout, reduce concurrent connections, check resource contention. + +Debug mode: +```bash +COLLECTOR_LOG_LEVEL=trace make TestProcessNetwork +CLEANUP_ON_FAIL=false make TestProcessNetwork +dlv test -- -test.run TestProcessNetwork +``` + +K8s debugging: +```bash +kubectl -n collector-tests logs collector +kubectl -n collector-tests describe pod collector +kubectl -n collector-tests get configmap runtime-config -o yaml +``` + +## Key Files + +| File | Purpose | +|------|---------| +| integration_test.go | Main test registration | +| k8s_test.go | K8s test registration (tag: k8s) | +| benchmark_test.go | Benchmark registration (tag: bench) | +| Makefile | Test execution targets | +| suites/base.go | Base test suite utilities | +| pkg/mock_sensor/ | Mock sensor implementation | +| pkg/executor/ | Container runtime abstraction | +| pkg/collector/ | Collector container manager | + +## CI Integration + +CircleCI runs tests across platforms: +```yaml +integration-tests-rhel-8: + environment: + VM_TYPE: rhel + VM_CONFIG: rhel.rhel-8 + COLLECTION_METHOD: ebpf + steps: + - checkout + - run: cd ansible && make integration-tests +``` + +GitHub Actions for IBM Z/Power (ppc64le, s390x): +```yaml +jobs: + test-s390x: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + - name: Run tests + run: cd ansible && ansible-playbook -i ci integration-tests.yml + env: + IC_API_KEY: ${{ secrets.IBM_CLOUD_API_KEY }} +``` + +## References + +- [Ansible Integration Tests](../ansible/README.md) +- [Collector Architecture](architecture.md) +- [Build System](build.md) diff --git a/docs/lib/README.md b/docs/lib/README.md new file mode 100644 index 0000000000..1f1e45ad53 --- /dev/null +++ b/docs/lib/README.md @@ -0,0 +1,171 @@ +# collector/lib/ Index + +Core C++ implementation of StackRox Collector runtime data collection engine. + +**Codebase:** ~16,521 lines C++ across 108 files +**Location:** `collector/lib/` + +## Component Overview + +**System Inspector** → Manages libsinsp event loop, dispatches to handlers +**Signal Handlers** → Process/network event consumers, feed gRPC senders +**Connection Tracking** → Afterglow-based aggregation, deduplication +**gRPC Communication** → Bidirectional streaming to Sensor +**Procfs Scraper** → Pre-existing connection discovery +**Configuration** → Runtime reload via inotify + +## Key Modules + +### Event Processing + +`system-inspector/Service.{h,cpp}` - SystemInspectorService +Main event loop. Creates sinsp inspector, opens modern BPF driver, registers NetworkSignalHandler and ProcessSignalHandler, runs `inspector->next()` dispatch loop. + +`EventExtractor.{h,cpp}` - system_inspector::EventExtractor +Wraps libsinsp event APIs. Extracts syscall parameters, connection tuples, process info. Used by NetworkSignalHandler. + +### Signal Handlers + +`NetworkSignalHandler.{h,cpp}` - NetworkSignalHandler +Consumes network syscalls (connect, accept, sendto, recvfrom, close). Extracts tuples, feeds ConnTracker, tracks async connection status. + +`ProcessSignalFormatter.{h,cpp}` - ProcessSignalFormatter +Consumes process events (execve, fork, clone, exit). Builds lineage, formats signals, sends via gRPC. + +`SignalHandler.h` - ISignalHandler interface +Base class for all handlers. Defines `Start()`, `Stop()`, `Run()` lifecycle. + +### Connection Management + +`ConnTracker.{h,cpp}` - ConnTracker +Afterglow-based flow aggregation. Maintains active/inactive connection maps, deduplication by 5-tuple, periodic scrubbing. Sends to gRPC queue. + +`ConnScraper.{h,cpp}` - ConnScraper +Scans /proc/net/{tcp,udp,raw} for pre-existing connections. Populates endpoints, tracks listen sockets. Enriches with process info via /proc/[pid]/fd/. + +`Afterglow.{h,cpp}` - Afterglow +Generic afterglow container. Tracks active/inactive items with expiration. Template used by ConnTracker. + +### Network Infrastructure + +`NetworkConnection.{h,cpp}` - NetworkConnection +Connection 5-tuple representation (src/dst IP:port, protocol). Equality, hashing for map keys. + +`NetworkStatusNotifier.{h,cpp}` - NetworkStatusNotifier +Async connection status tracking via getsockopt. Monitors non-blocking connects. + +`HostHeuristics.{h,cpp}` - HostHeuristics +Public IP detection. Determines if connection endpoint is cluster-internal or external. + +### Process Management + +`ProcessSignalHandler.{h,cpp}` - ProcessSignalHandler +Wraps ProcessSignalFormatter. Implements ISignalHandler interface. + +`ProcessStore.{h,cpp}` - ProcessStore +Caches process info. Maps PID → ProcessInfo, handles lineage updates. + +`ContainerMetadata.{h,cpp}` - ContainerMetadata, IContainerMetadataExtractor +Extracts container ID, pod name, namespace via libsinsp container manager. + +### gRPC Communication + +`CollectorService.{h,cpp}` - CollectorService +gRPC service implementation. Bidirectional streaming, handles signals/commands from Sensor. + +`GRPCUtil.{h,cpp}` - grpc utilities +Connection management, channel creation, credentials. + +`CollectorStats.{h,cpp}` - CollectorStatsExporter +Prometheus metrics export, gRPC endpoint for stats. + +### Configuration + +`CollectorConfig.{h,cpp}` - CollectorConfig +Parses YAML runtime config. Networking, scraping, afterglow, TLS settings. + +`FileDownloader.{h,cpp}` - DummyFileDownloader +Stub for kernel object downloads (deprecated). + +### Kernel Interface + +`KernelDriver.{h,cpp}` - ModernBPFDriver +Loads CO-RE BPF probe. References libscap engine, manages driver lifecycle. + +`SysdigService.{h,cpp}` - UserSysdigService (legacy) +Original falcosecurity-libs wrapper. Being phased out in favor of SystemInspectorService. + +### Utilities + +`Utility.{h,cpp}` - String, time, networking helpers +`Logging.{h,cpp}` - Logging infrastructure +`HostInfo.{h,cpp}` - Host metadata (OS, kernel version) +`DuplexGRPC.h` - Bidirectional gRPC wrapper +`StoppableThread.h` - RAII thread management +`GRPC.{h,cpp}` - gRPC async infrastructure + +### Testing + +`CollectorServiceMocks.h` - Mock gRPC services +`MockCollectorConfig.h` - Test configuration +`MockSysdigService.h` - Mock event source + +## Data Flow + +``` +Kernel (CO-RE BPF) + ↓ +libscap ring buffers + ↓ +libsinsp (enrichment) + ↓ +SystemInspectorService::next() + ↓ + ├→ NetworkSignalHandler + │ ├→ EventExtractor (parse syscalls) + │ ├→ ConnTracker (afterglow) + │ └→ CollectorService (gRPC send) + │ + └→ ProcessSignalHandler + ├→ ProcessSignalFormatter (build signals) + └→ CollectorService (gRPC send) + +Parallel: +ConnScraper (periodic) + ↓ /proc/net/{tcp,udp} +ConnTracker (endpoints) + ↓ +CollectorService (gRPC send) +``` + +## Threading Model + +Main thread: Initialization, gRPC server +SystemInspector thread: Event loop (`inspector->next()`) +NetworkSignalHandler thread: Connection processing +ProcessSignalHandler thread: Process signal formatting +ConnScraper thread: Periodic /proc scanning +Afterglow scrubber: Periodic inactive connection cleanup + +Synchronization via mutexes in ConnTracker, ProcessStore. Lock-free queues for gRPC sends. + +## Configuration + +Environment variables: GRPC_SERVER, COLLECTION_METHOD, COLLECTOR_CONFIG (JSON/YAML). +Runtime config: /etc/stackrox/runtime_config.yaml (inotify reload). +Key settings: afterglow period, scrape interval, connection stats aggregation, TLS config. + +## Historical Context + +ROX-7482 migrated from sysdig to falcosecurity-libs. Afterglow algorithm introduced for network flow deduplication. ConnScraper added for pre-existing endpoint discovery. Async connection tracking via getsockopt (ROX-18856). Process lineage handling evolved through multiple iterations. gRPC bidirectional streaming replaced unary calls. + +## Technical Debt + +Legacy ConnTracker coexists with ConnTracker. SysdigService being replaced by SystemInspectorService. Multiple process signal paths (ProcessSignalHandler vs ProcessSignalFormatter). Inconsistent error handling patterns. Global state in some utilities. + +## References + +- [Architecture](../architecture.md) +- [falcosecurity-libs](../falcosecurity-libs.md) +- [Build System](../build.md) +- [Integration Tests](../integration-tests.md) diff --git a/docs/lib/config.md b/docs/lib/config.md new file mode 100644 index 0000000000..699b64f29f --- /dev/null +++ b/docs/lib/config.md @@ -0,0 +1,92 @@ +# Configuration System + +The configuration system handles static startup options, environment variables, runtime updates, and platform-specific heuristics. Configuration flows through command-line parsing, environment variable resolution, and YAML file watching for dynamic updates. + +## Static Configuration + +### Argument Parsing + +CollectorArgs is a singleton that uses optionparser for command-line arguments. The parse method processes --collector-config (JSON), --collection-method (ebpf/core_bpf), and --grpc-server. Each check* method validates inputs: checkCollectorConfig parses JSON via CollectorConfig::CheckConfiguration, checkCollectionMethod uses ParseCollectionMethod from CollectionMethod.h, and checkGRPCServer delegates to GRPC.cpp:CheckGrpcServer. + +The singleton pattern with getInstance() allows validator callbacks to access the instance without passing pointers through optionparser's C-style API. + +### Configuration Object + +CollectorConfig starts with compile-time defaults: kScrapeInterval=30, kCollectionMethod=CORE_BPF, kSyscalls defines the base set (accept, connect, execve, etc.). InitCollectorConfig merges command-line args, environment variables, and JSON config. Processing order: log level first (to enable early logging), then scrape interval, collection method, and TLS paths. + +Boolean flags like enable_processes_listening_on_ports_, import_users_, and track_send_recv_ read from BoolEnvVar instances. Network configuration includes ignored_l4proto_port_pairs_ (default: UDP:9), ignored_networks_ (169.254.0.0/16, fe80::/10), and non_aggregated_networks_. IPNet::parse converts CIDR strings to internal representation. + +Sinsp buffer configuration defaults to DEFAULT_DRIVER_BUFFER_BYTES_DIM and DEFAULT_CPU_FOR_EACH_BUFFER from libscap. GetSinspBufferSize adjusts buffer dimensions: given sinsp_total_buffer_size_ (default 512MB) and CPU count, it calculates n_buffers = ceil(num_cpus / sinsp_cpu_per_buffer_), then max_buffer_size = ceil(total / n_buffers). The result must be a power of two >= 2^14 (16KB, two pages minimum) for ringbuffer alignment. + +Connection stats use quantiles (default 0.50, 0.90, 0.95), error tolerance (0.01), and window size (60 seconds). These drive quantile estimation algorithms for network telemetry. + +### Environment Variables + +EnvVar.h templates provide lazy initialization with std::call_once. The ParseT functor converts strings: ParseBool accepts "true" case-insensitively, ParseStringList splits on commas, ParseInt/ParseFloat use standard library converters, ParsePath wraps std::filesystem::path. + +Environment variables override defaults but are superseded by command-line arguments. For example, grpc_server first checks args->GRPCServer(), then the GRPC_SERVER env var. TLS paths check tls_certs_path first for a base directory, then individual overrides like tls_ca_path. + +CollectorConfig:282-348 shows HandleAfterglowEnvVars and HandleConnectionStatsEnvVars parsing float/int/list types with try-catch for robustness. Invalid values log warnings and preserve defaults. + +## Runtime Configuration + +### YAML Parsing + +ConfigLoader uses ParserYaml to convert YAML files into sensor::CollectorConfig protobufs. ParserYaml::Parse recursively walks protobuf descriptors, matching YAML fields to protobuf FieldDescriptors. read_camelcase_ mode converts between YAML camelCase and protobuf snake_case via SnakeCaseToCamel/CamelCaseToSnake. + +Validation modes: STRICT requires all fields, PERMISSIVE allows missing/unknown fields, UNKNOWN_FIELDS_ONLY rejects extra fields but allows omissions. FindUnknownFields recursively scans YAML nodes to detect typos or deprecated fields. + +ParseScalar handles primitive types via TryConvert, wrapping YAML::Node::as() to return std::variant instead of throwing. Enum parsing uppercases strings and uses EnumDescriptor::FindValueByName. ParseArray iterates sequences, calling ParseArrayInner for primitives or ParseArrayEnum for enums. + +### File Watching + +ConfigLoader::WatchFile uses Inotify to monitor /etc/stackrox/runtime_config.yaml (from ROX_COLLECTOR_CONFIG_PATH). Three watchers track: LOADER_PARENT_PATH for the parent directory (to catch file creation), LOADER_CONFIG_FILE for the config file itself, and LOADER_CONFIG_REALPATH for the symlink target if the config is a symlink. + +HandleConfigDirectoryEvent detects IN_CREATE/IN_MOVED_TO to add file watchers, and IN_DELETE/IN_MOVED_FROM to reset runtime config. HandleConfigFileEvent responds to IN_MODIFY by reloading, and IN_MOVE_SELF/IN_DELETE_SELF by checking if the file reappeared (atomic writes via rename). HandleConfigRealpathEvent handles symlink target changes, re-pointing the watcher to the new target. + +LoadConfiguration creates a default sensor::CollectorConfig with NewRuntimeConfig (setting max_connections_per_minute to kMaxConnectionsPerMinute), parses the YAML, and calls config_.SetRuntimeConfig with the result. The config is protected by a shared_mutex, with ReadLock for consumers and WriteLock for updates. + +### Runtime Access + +GetExternalIPsConf, MaxConnectionsPerMinute, and GetRuntimeConfigStr acquire ReadLock and check runtime_config_.has_value(). This pattern allows runtime_config_ to be absent (no YAML file) without affecting static configuration. ResetRuntimeConfig clears the optional, reverting to environment variables and static defaults. + +CollectorRuntimeConfigInspector exposes runtime config via HTTP at /state/runtime-config. The handleGet method calls configToJson, which uses protobuf::util::MessageToJsonString with always_print_fields_with_no_presence to include zero values. + +## Platform Heuristics + +### Host Configuration + +HostConfig provides per-host overrides discovered at runtime. SetCollectionMethod and SetNumPossibleCPUs populate values that supersede CollectorConfig when HasCollectionMethod returns true. HostHeuristics.cpp:ProcessHostHeuristics applies heuristics in order. + +CollectionHeuristic validates eBPF support via HostInfo::HasEBPFSupport, BTF symbols, BPF ringbuffer support, and BPF tracing support. CORE_BPF requires all four; missing any results in CLOG(FATAL). EBPF mode checks if CORE_BPF is available and logs an informational message suggesting the upgrade. + +DockerDesktopHeuristic fails fatally if HostInfo::IsDockerDesktop, since Docker Desktop doesn't support eBPF. PowerHeuristic checks for RHEL 8.6 on ppc64le with kernel <4.18.0-477, which lacks CORE_BPF support. CPUHeuristic queries HostInfo::NumPossibleCPU via libbpf_num_possible_cpus and stores the result for buffer size calculations. + +### Host Information + +HostInfo is a singleton providing lazy-initialized system data. GetKernelVersion parses uname.release (e.g., "4.18.0-372.el8.x86_64") into kernel/major/minor/build_id integers via regex. HasEBPFSupport returns true for kernel >=4.14 or RHEL 7.6 (3.10.0-957+). + +HasBTFSymbols checks paths from libbpf's search order: /sys/kernel/btf/vmlinux (kernel-provided), /boot/vmlinux-*, /lib/modules/*/vmlinux-*, /usr/lib/debug paths. Uses faccessat with AT_EACCESS to respect permissions. Paths with .mounted=true prepend GetHostPath for containerized operation. + +HasBPFRingBufferSupport calls libbpf_probe_bpf_map_type(BPF_MAP_TYPE_RINGBUF), checking the kernel's bpf() syscall response. HasBPFTracingSupport probes BPF_PROG_TYPE_TRACING. These probes execute quickly (single syscall) and avoid loading actual BPF programs. They run during heuristic evaluation, before attempting to load the CO-RE probe. + +GetSecureBootStatus reads UEFI variables from /sys/firmware/efi/efivars/SecureBoot-* (5 bytes: 4-byte attributes + 1-byte status) or boot_params at offset 0x1EC for older kernels. Returns ENABLED/DISABLED/NOT_DETERMINED. + +GetHostname checks NODE_HOSTNAME env var, then /etc/hostname and /proc/sys/kernel/hostname via GetHostnameFromFile. CLOG(FATAL) if none found. GetDistro/GetOSID/GetBuildID parse /etc/os-release or /usr/lib/os-release in KEY="VALUE" format via filterForKey. + +### TLS Configuration + +TlsConfig is a simple container for CA/cert/key paths. IsValid checks all three are non-empty. CollectorConfig:198-215 handles three sources: JSON tlsConfig with explicit paths, tls_certs_path base directory (defaults ca.pem/cert.pem/key.pem), or individual path env vars. + +GRPC.cpp:24 uses ReadFileContents to load PEM strings into grpc::SslCredentialsOptions. Files are read once at startup rather than watched for changes, so certificate rotation requires collector restart. + +## Configuration Lifecycle + +1. CollectorArgs::parse processes command-line flags, validating and storing in singleton +2. CollectorConfig::InitCollectorConfig merges args, env vars, and applies defaults +3. ProcessHostHeuristics inspects HostInfo and sets HostConfig overrides +4. ConfigLoader::Start begins watching runtime config YAML in background thread +5. Runtime updates arrive via LoadConfiguration, acquiring WriteLock and calling SetRuntimeConfig +6. Readers acquire ReadLock and check runtime_config_.has_value() for optional overrides + +This layering allows static configuration for kernel instrumentation while enabling dynamic network policy updates without restarting collector. diff --git a/docs/lib/containers.md b/docs/lib/containers.md new file mode 100644 index 0000000000..1775cba872 --- /dev/null +++ b/docs/lib/containers.md @@ -0,0 +1,71 @@ +# Container Abstraction + +The container layer maps processes to container IDs, extracts Kubernetes metadata, and provides HTTP introspection of container state. This bridges kernel events (which track PIDs) to orchestrator concepts (pods, namespaces). + +## Container Identification + +### Container Engine + +ContainerEngine (ContainerEngine.h:8) implements libsinsp::container_engine::container_engine_base. The resolve method receives a sinsp_threadinfo and queries_os_for_missing_info flag. It iterates tinfo->cgroups(), calling ExtractContainerIDFromCgroup on each cgroup path. + +ExtractContainerIDFromCgroup (defined elsewhere in container_engine code) parses cgroup paths like /kubepods/besteffort/pod.../...container_id. When a container ID is found, it's assigned to tinfo->m_container_id and resolve returns true. No container ID means the process isn't containerized, returning false. + +This engine integrates with libsinsp's container resolution pipeline. Multiple container engines can register; sinsp tries each until one succeeds. ContainerEngine runs early, using cgroup paths without querying container runtimes. + +## Metadata Extraction + +### Container Metadata + +ContainerMetadata wraps sinsp and EventExtractor to provide container information. The constructor (ContainerMetadata.cpp:9) creates EventExtractor and calls Init(inspector), which sets up internal libsinsp pointers. + +GetNamespace(sinsp_evt*) calls event_extractor_->get_k8s_namespace(event), which reads Kubernetes labels from the event's container. EventExtractor is defined in system-inspector/EventExtractor.h (not shown, but used here). The method returns the namespace string or empty if not available. + +GetNamespace(container_id) queries libsinsp's container cache directly. inspector_->m_container_manager.get_containers() returns a map of container IDs to container info structs. The method looks up container_id, searches its m_labels map for "io.kubernetes.pod.namespace", and returns the value or empty string. + +GetContainerLabel generalizes this pattern: given container_id and label key, it queries the container cache and returns the label value. This supports arbitrary label extraction beyond namespaces (e.g., pod name, deployment). + +### Event Extractor + +EventExtractor (used but not defined in shown files) provides accessors for event metadata. get_k8s_namespace reads labels added by libsinsp when it correlates events with container state. Init must be called with a sinsp instance to establish the connection. + +The extractor abstracts libsinsp's internal structures. Rather than navigating sinsp_evt->threadinfo->container->labels, callers use get_k8s_namespace directly. This insulates collector code from libsinsp API changes. + +## Introspection + +### Container Info Inspector + +ContainerInfoInspector handles HTTP GET requests at /state/containers/{id}. handleGet (ContainerInfoInspector.cpp:10) extracts the container ID from req_info->local_uri by finding the last slash and taking the substring after it. It validates length == 12 (standard short container ID). + +The response builds a Json::Value with container_id and namespace fields. GetNamespace(container_id) queries ContainerMetadata. Json::FastWriter::write serializes to a compact string. The HTTP response is 200 OK with application/json content type. + +Invalid container IDs return ClientError (defined in CivetWrapper, sends 400 Bad Request). Server errors like failed request parsing return ServerError (500 Internal Server Error). This provides debugging visibility into container state without restarting collector. + +### CivetWrapper Integration + +ContainerInfoInspector extends CivetWrapper (not shown), which extends CivetHandler. GetBaseRoute returns kBaseRoute = "/state/containers/". CivetServer uses this prefix to route URLs starting with /state/containers/ to this handler. + +CollectorService registers the handler only when config.IsIntrospectionEnabled(). This protects internal state from exposure in production unless explicitly enabled. The handler is instantiated with container_metadata_inspector_, which is created from system_inspector_.GetContainerMetadataInspector(). + +## Container Lifecycle + +Containers appear in libsinsp's cache when processes spawn or when existing containers are discovered during initial scan. The cache updates as libsinsp processes clone/execve events and reads cgroup changes. + +ContainerMetadata reads this cache without modifying it. The cache is eventually consistent: a newly started container may not appear immediately in GetNamespace results. EventExtractor's event-driven path via get_k8s_namespace accesses fresh data directly from events. + +GetContainerLabel can query any label, not just Kubernetes-specific ones. Docker labels, OCI labels, and custom annotations are all accessible via the m_labels map. This supports future extensions like querying pod UIDs or deployment names. + +## Error Handling + +GetNamespace returns empty strings rather than throwing exceptions. Callers check for emptiness and handle missing metadata gracefully. This aligns with collector's philosophy of best-effort data collection. + +GetContainerLabel similarly returns empty for missing containers or labels. The container cache lookup uses find() and checks for end() before dereferencing. Missing labels also return empty. + +ContainerInfoInspector validates container ID length but doesn't verify the container exists. GetNamespace will return empty for invalid IDs, resulting in valid JSON with an empty namespace field. This prevents errors from stopping introspection queries. + +## Integration with System Inspector + +system_inspector::Service (defined in system-inspector/Service.h) owns the ContainerMetadata instance and exposes it via GetContainerMetadataInspector(). This ensures ContainerMetadata and EventExtractor lifecycle match the sinsp instance. + +ContainerMetadata's dependency on sinsp means it must be destroyed before sinsp. CollectorService's destruction order (system_inspector_.CleanUp last) ensures correct teardown. + +NetworkSignalHandler and other components may also use ContainerMetadata to enrich events with Kubernetes context. The shared access model allows multiple readers without synchronization, as libsinsp's container cache is internally synchronized. diff --git a/docs/lib/core.md b/docs/lib/core.md new file mode 100644 index 0000000000..589b9f71c7 --- /dev/null +++ b/docs/lib/core.md @@ -0,0 +1,109 @@ +# Core Service Infrastructure + +The core infrastructure manages collector's main loop, lifecycle, metrics collection, status reporting, and diagnostic endpoints. Components coordinate between configuration, kernel instrumentation, network monitoring, and sensor communication. + +## Service Management + +### Collector Service + +CollectorService orchestrates all subsystems. The constructor (CollectorService.cpp:27) initializes system_inspector_ with config_, creates ConnectionTracker unless network flows are disabled, and sets up NetworkStatusNotifier linking conn_tracker_ and system_inspector_. NetworkSignalHandler attaches to system_inspector_ to process network events. + +Civetweb handlers register at construction: GetStatus for /ready, LogLevelHandler for runtime log control, ProfilerHandler for CPU profiling. When introspection is enabled, ContainerInfoInspector (/state/containers/), NetworkStatusInspector, and CollectorConfigInspector expose internal state. Prometheus exposer binds to port 9090 and registers the exporter's registry. + +RunForever (CollectorService.cpp:94) starts ConfigLoader, NetworkStatusNotifier, and CollectorStatsExporter. system_inspector_.Start launches the kernel event processing thread. The main loop calls system_inspector_.Run(*control_) until control_ transitions to STOP_COLLECTOR. On shutdown, components stop in reverse order: ConfigLoader, NetworkStatusNotifier, exporter, then system_inspector_.CleanUp. + +InitKernel calls system_inspector_.InitKernel(config_), which selects and loads the CO-RE BPF probe. Startup diagnostics track driver availability and success for the diagnostic log in Diagnostics.h:StartupDiagnostics. + +WaitForGRPCServer blocks until config_.grpc_channel reaches READY state, using an interrupt lambda that checks control_->load(). This ensures sensor connectivity before collecting events. + +### Lifecycle Control + +Control.h defines ControlValue enum: RUN continues operation, STOP_COLLECTOR initiates shutdown. The main function passes std::atomic and std::atomic signum to CollectorService, allowing signal handlers to trigger graceful termination. + +system_inspector_.Run receives the control atomic and polls it during event processing. When STOP_COLLECTOR is detected, Run returns, unwinding the RunForever loop. The signum atomic preserves which signal triggered shutdown for diagnostic logging. + +### Stoppable Threads + +StoppableThread wraps std::thread with cancellation via pipe and condition variable. prepareStart creates a pipe, storing descriptors in stop_pipe_[]. Start launches the thread and stores it in thread_. The stop_fd() accessor returns stop_pipe_[0] for select/poll integration. + +Stop sets should_stop_ atomic, notifies stop_cond_, closes the write end of the pipe (triggering POLLHUP on read end), then joins the thread. PauseUntil waits on stop_cond_ with a deadline, returning false if the deadline expires (continue operation) or true if should_stop() becomes true (shutdown). + +ConfigLoader, SignalServiceClient, CollectorStatsExporter, and NetworkStatusNotifier use StoppableThread for background tasks. Threads check should_stop() in their loops and include stop_fd() in poll sets to wake immediately on cancellation. + +## Metrics and Statistics + +### Collector Stats + +CollectorStats is a singleton with atomic counters and timers. TimerType enum includes net_scrape_read, net_scrape_update, process_info_wait. CounterType includes net_conn_updates, process_lineage_counts, procfs_* error counters. SCOPED_TIMER(index) creates an RAII timer that records duration on destruction. COUNTER_INC/COUNTER_SET macros modify atomics without locking. + +The singleton pattern with GetOrCreate ensures static initialization before main. Reset zeroes all counters, used in tests. Array indexing by enum provides O(1) access for high-frequency operations. + +### Stats Exporter + +CollectorStatsExporter runs a background thread that polls CollectorStats and system_inspector_ every 5 seconds. The run method creates Prometheus gauges for rox_collector_events (kernel events, drops, preemptions), rox_collector_timers (per-timer events/total/avg), and rox_collector_counters. + +When enable_detailed_metrics_ is true, it builds rox_collector_events_typed and rox_collector_event_times_* families with labels for event_type and event_dir (enter/exit). The loop checks config_->Syscalls() to filter events, only creating metrics for enabled syscalls. + +Process lineage statistics compute average, standard deviation, and string length from process_lineage_total/process_lineage_sqr_total/process_lineage_string_total counters. Variance calculation: std_dev = sqrt((sqr_avg) - (avg)^2). + +GetRegistry returns the prometheus::Registry shared_ptr for registration with the Exposer. The main loop creates prometheus::Exposer with port 9090 and calls RegisterCollectable. + +### Event Statistics + +system_inspector::Stats (from SystemInspector.h) provides kernel-level counters: nEvents, nDrops, nDropsBuffer, nDropsThreadCache, nPreemptions, nThreadCacheSize. GetStats populates this structure from libsinsp metrics. + +nUserspaceEvents[PPM_EVENT_MAX] counts events processed in userspace by type. event_parse_micros and event_process_micros track time spent parsing events from kernel and processing them through handlers. These arrays align with PPM_EVENT_MAX from ppm_events_public.h. + +Process event counters track nProcessSent, nProcessSendFailures, nProcessResolutionFailuresByEvt (missing process info during event), nProcessResolutionFailuresByTinfo (missing threadinfo). nGRPCSendFailures counts signal stream write errors. + +## Status and Diagnostics + +### Health Endpoint + +GetStatus implements /ready for Kubernetes liveness/readiness probes. handleGet calls system_inspector_->GetStats(&stats), returning 503 Service Unavailable if the inspector isn't ready. Success returns 200 with JSON: hostname from HostInfo::GetHostname, events/drops breakdown, preemptions count. + +The drops object separates total (nDrops), ringbuffer (nDropsBuffer), and threadcache (nDropsThreadCache). This granularity helps diagnose whether drops occur in kernel ringbuffer overflow or userspace cache eviction. + +### Introspection Endpoints + +ContainerInfoInspector parses container IDs from URLs like /state/containers/{id} (12 hex chars). It queries ContainerMetadata::GetNamespace, which uses EventExtractor to read Kubernetes labels from libsinsp's container cache. The response includes container_id and namespace as JSON. + +NetworkStatusInspector (referenced but not defined in provided files) exposes ConnTracker state. CollectorConfigInspector converts runtime_config_ to JSON via protobuf::util::MessageToJsonString, allowing runtime inspection of YAML-based configuration. + +CivetWrapper.h (not shown) provides a base class for HTTP handlers with GetBaseRoute. CollectorService::civet_endpoints_ stores unique_ptrs and calls server_.addHandler(endpoint->GetBaseRoute(), endpoint.get()) to register routes. + +### Startup Diagnostics + +StartupDiagnostics::GetInstance singleton accumulates initialization info. ConnectedToSensor records GRPC connection success. DriverAvailable, DriverSuccess, and DriverUnavailable track kernel driver loading attempts. Log dumps the accumulated state with connection status and driver candidates. + +The diagnostic log appears once during startup, providing a snapshot of configuration and initialization outcomes for troubleshooting. + +## Integration Patterns + +### Thread Coordination + +CollectorService owns all threads: system_inspector_ internal thread, ConfigLoader thread_, NetworkStatusNotifier threads, CollectorStatsExporter thread_. Destruction order matters: network components stop before system_inspector_.CleanUp to avoid using freed kernel state. + +The control atomic provides single-bit signaling. More complex coordination uses dedicated condition variables (e.g., SignalServiceClient::stream_interrupted_) or combines control checks with component-specific atomics. + +### Shared State Access + +config_ uses shared_mutex: readers acquire ReadLock, writers acquire WriteLock. This allows many threads to read static config while ConfigLoader updates runtime_config_ occasionally. GetExternalIPsConf, MaxConnectionsPerMinute, GetRuntimeConfigStr follow this pattern. + +ConnTracker (referenced in CollectorService) is shared via shared_ptr across NetworkStatusNotifier and NetworkSignalHandler. This shared ownership ensures the tracker outlives handler processing. + +system_inspector_ is owned by CollectorService and accessed by GetStatus and CollectorStatsExporter via raw pointer. The service's lifetime guarantees these dependencies, so shared_ptr overhead is avoided. + +### Error Propagation + +Initialization failures use CLOG(FATAL), terminating the process. Runtime errors like GRPC stream failures log errors and set state (stream_active_=false) to trigger reconnection. The control atomic allows orderly shutdown rather than process exit on transient errors. + +StoppableThread::prepareStart returns false if already running, and doStart always succeeds. Callers check the return value to avoid starting multiple threads. Stop is idempotent: closing an already-closed pipe continues, join blocks until the thread exits. + +CollectorStatsExporter::start returns false if the thread is already running, true on success. This boolean return allows callers to handle the unlikely start failure without exceptions. + +### Configuration Flow + +Static config flows: CollectorArgs → CollectorConfig::InitCollectorConfig → HostHeuristics → CollectorService constructor. Runtime config flows: YAML file → ConfigLoader inotify → ParserYaml → CollectorConfig::SetRuntimeConfig. Readers check runtime_config_.has_value() to distinguish static vs. dynamic values. + +This separation allows kernel configuration (syscalls, buffer sizes) to remain static while network policies (max_connections_per_minute, external IPs) update dynamically. diff --git a/docs/lib/grpc.md b/docs/lib/grpc.md new file mode 100644 index 0000000000..26ca011730 --- /dev/null +++ b/docs/lib/grpc.md @@ -0,0 +1,53 @@ +# GRPC Communication Layer + +The GRPC layer manages bidirectional streaming connections between collector and sensor, handling TLS configuration, channel lifecycle, and message transmission with automatic reconnection. + +## Components + +### Channel Creation and Configuration + +GRPC.cpp:CreateChannel establishes channels with keepalive parameters for long-lived connections. The implementation sets HTTP2 BDP probing, ping intervals (10s keepalive time, 5s minimum receive interval), and allows pings without active calls to maintain connectivity. TLSCredentialsFromConfig reads PEM files from TlsConfig paths and constructs grpc::SslCredentials for mTLS connections. CheckGrpcServer validates server addresses in host:port format. + +### Bidirectional Streaming + +DuplexGRPC.h defines templates for async bidirectional GRPC streams without explicit multithreading or completion queue manipulation. The abstraction separates three usage patterns: full bidirectional control (DuplexClientReaderWriter), write-only with callback-based reads (CreateWithReadCallback), and write-only ignoring reads (CreateWithReadsIgnored). + +DuplexClient manages a grpc::CompletionQueue and tracks operation states through flags. Each async operation (READ, WRITE, WRITES_DONE, FINISH, SHUTDOWN) has pending and done bits in a uint32_t flags field. Operations map to completion queue tags via OpToTag/TagToOp pointer encoding. ProcessSingle polls the completion queue, updates flags, and dispatches event handlers. Synchronous methods like Write and Finish wrap async operations with deadline-based polling. + +DuplexClientReaderWriter combines reading and writing. Read messages buffer in read_buf_, with read_buf_valid tracking success. After processing each read, ReadNext immediately issues another async read to maintain the stream. Write operations block until the completion queue indicates success. The destructor calls Shutdown and drains remaining events to ensure clean termination. + +StdoutDuplexClientWriter provides a no-op implementation for testing and offline operation. All methods return success immediately, and Write calls LogProtobufMessage to emit protobufs to stdout. + +### Signal Service Client + +SignalServiceClient manages a persistent GRPC stream for process signals. The implementation runs EstablishGRPCStream in a StoppableThread, which repeatedly calls EstablishGRPCStreamSingle to create connections with automatic retry. WaitForChannelReady (from GRPCUtil) polls channel state with 1-second intervals, checking for GRPC_CHANNEL_SHUTDOWN as a fatal condition. + +Stream lifecycle: SignalServiceClient:30 creates the DuplexClient writer via CreateWithReadsIgnored since sensor doesn't send data on this stream. WaitUntilStarted blocks up to 30 seconds for stream readiness. The first_write_ flag triggers NEEDS_REFRESH to force a full process list resend after reconnection. Write failures invoke FinishNow, reset the writer, and set stream_active_ to false, triggering the background thread to reconnect. + +StoppableThread uses a pipe and condition variable for cancellation. The stream_interrupted_ condition variable wakes EstablishGRPCStream when stream_active_ transitions to false, minimizing reconnection latency. + +## Configuration + +CollectorConfig stores grpc_server_ addresses and tls_config_ objects. InitCollectorConfig processes command-line arguments via CollectorArgs or environment variables (GRPC_SERVER, ROX_COLLECTOR_TLS_*). TlsConfig validates that CA, client cert, and client key paths are all non-empty via IsValid. + +GRPC.cpp:24 reads complete file contents into SslCredentialsOptions strings. This approach loads certificates once at startup rather than maintaining file handles. + +## Connection Management + +GRPCUtil:WaitForChannelReady polls grpc::Channel::WaitForConnected with a timeout, checking an interrupt function between attempts. This allows graceful shutdown during connection establishment. The collector main loop passes lambdas that check control flags and signal state. + +SignalServiceClient maintains connection state with atomic stream_active_ and synchronizes reconnection through stream_interrupted_. When PushSignals detects a write failure, it notifies the background thread via stream_interrupted_.notify_one(), which immediately attempts reconnection without waiting for the next scheduled retry. + +## Error Handling + +CheckGrpcServer returns ARG_OK/ARG_ILLEGAL/ARG_IGNORE with error messages. Invalid formats (missing host/port, malformed address, length >255) return ARG_ILLEGAL. Missing configuration returns ARG_IGNORE with a message about reverting to stdout. + +DuplexClient tracks STREAM_ERROR via flag bit to propagate failures. When ok=false in completion queue events, ProcessEvent sets STREAM_ERROR and issues FinishAsync. Sync operations check Result::ok() and Result::IsTimeout() to distinguish success, errors, and timeouts. + +SignalServiceClient:PushSignals throttles error logs to once per 10 seconds when the stream is unavailable, preventing log spam during extended sensor outages. Write failures capture the grpc::Status error message before resetting the writer. + +## Integration Points + +CollectorService creates the grpc::Channel via CreateChannel and stores it in config_.grpc_channel for sharing between components. NetworkStatusNotifier and other components receive the channel to create their own GRPC clients. WaitForGRPCServer blocks during startup until the channel reaches READY state, ensuring sensor connectivity before beginning data collection. + +StdoutSignalServiceClient provides the same interface without GRPC dependencies, allowing collector to run without sensor by emitting signals as JSON to stdout. This supports debugging and offline analysis workflows. diff --git a/docs/lib/networking.md b/docs/lib/networking.md new file mode 100644 index 0000000000..c98664439f --- /dev/null +++ b/docs/lib/networking.md @@ -0,0 +1,133 @@ +# Network Flow Tracking + +Collector tracks TCP and UDP connections, aggregates them to reduce volume, and sends periodic deltas to Sensor over gRPC. The core flow is: kernel events → ConnTracker → normalization → afterglow → delta → Sensor. + +## Architecture + +``` +CO-RE BPF probes (connect, accept, close, sendto, etc.) + │ + ▼ +NetworkSignalHandler ← receives syscall events from SystemInspector + │ + ▼ +ConnTracker ← normalizes, deduplicates, applies afterglow + │ + ▼ +NetworkStatusNotifier ← scrapes /proc/net/tcp as fallback, computes deltas + │ + ▼ +gRPC stream → Sensor ← NetworkConnectionInfoMessage protos +``` + +## Event Reception + +NetworkSignalHandler (`NetworkSignalHandler.h`) is the abstraction boundary between collector and the kernel instrumentation layer. It receives syscall events and extracts connection metadata. + +**Monitored syscalls:** close, shutdown, connect, accept, accept4, getsockopt. When UDP tracking is enabled, also sendto/sendmsg/recvfrom/recvmsg variants. + +For each event, the handler: +- Extracts source/destination IPs and ports from the file descriptor info +- Determines connection role (client vs server) from socket flags +- Filters failed or pending async connections +- Builds a Connection object and passes it to ConnTracker + +**Key files:** NetworkSignalHandler.cpp (event processing), NetworkConnection.h (Connection type definition) + +## Connection Tracking + +ConnTracker (`ConnTracker.h`) maintains two state maps: +- **conn_state_** — active connections keyed by Connection (src/dst/port/protocol) +- **endpoint_state_** — listening sockets keyed by ContainerEndpoint + +Each entry stores a ConnStatus: a 63-bit microsecond timestamp packed with a 1-bit active flag. + +Updates are thread-safe. When a connection event arrives, ConnTracker either inserts it or updates the timestamp if newer. Statistics are tracked by direction (inbound/outbound) and address type (public/private). + +**Key files:** ConnTracker.cpp (state management), ConnTracker.h (data structures) + +## Normalization + +Before transmission, connections are normalized to reduce cardinality: + +| Role | Local endpoint | Remote endpoint | +|------|---------------|-----------------| +| Server | Cleared to port only | Address normalized, port set to 0 | +| Client | Entirely cleared | Address + port preserved after normalization | + +Address normalization depends on network classification: +- **Private IPs** (RFC1918) — preserved with CIDR from known networks +- **Known public IPs** (cluster nodes) — preserved as /32 or /128 +- **Unknown public IPs** — either preserved individually (when external IPs enabled) or collapsed to a sentinel address (255.255.255.255 or ffff:...:ffff) + +For UDP, server role detection is unreliable, so ConnTracker compares ephemeral port confidence scores to infer which side is the client. + +**Key files:** ConnTracker.cpp `NormalizeConnectionNoLock`, `NormalizeAddressNoLock` + +## External IPs + +ExternalIPsConfig (`ExternalIPsConfig.h`) controls whether unknown public IPs are preserved or aggregated, configurable per-direction (ingress/egress/both). The config arrives from Sensor via runtime collector config. + +When the external IPs setting changes, affected connections must be closed and re-reported in the new format. ConnTracker handles this in `CloseConnectionsOnExternalIPsConfigChange`. + +## Network Topology from Sensor + +Sensor sends network topology information that ConnTracker uses for normalization: +- **Known public IPs** — cluster node IPs, preserved without aggregation +- **Known IP networks** — CIDR ranges stored in the NRadix tree +- **Ignored L4 pairs** — protocol/port combinations to filter (e.g., metrics endpoints) +- **Ignored networks** — CIDR ranges to completely exclude +- **Non-aggregated networks** — CIDRs where each IP is preserved + +These arrive via `NetworkStatusNotifier.OnRecvControlMessage` from the Sensor gRPC stream. + +## NRadix Tree + +NRadixTree (`NRadix.h`) implements a binary radix tree for fast CIDR lookups, based on the NGINX implementation. It supports both IPv4 and IPv6 with longest-prefix matching — an address matching both /8 and /16 returns the more specific /16. + +The tree also supports subset detection via `IsAnyIPNetSubset`, which walks two trees in parallel to determine containment. + +**Key files:** NRadix.cpp (insert, find, subset operations) + +## Afterglow + +Afterglow suppresses transient connection churn by treating recently-closed connections as still active for a configurable window (default 30s). This significantly reduces the volume of open/close pairs sent to Sensor. + +The algorithm works during delta computation: +- A connection that closed within the afterglow window is reported as **active** +- Only after the window expires is the **close** event reported +- If the connection reopens before the window expires, no close is ever sent + +`ComputeDeltaAfterglow` in ConnTracker.h compares old and new state snapshots, applying these rules to produce the minimal delta. + +## Periodic Scraping and Delta Reporting + +NetworkStatusNotifier (`NetworkStatusNotifier.h`) runs a periodic loop (default 30s): + +1. **Scrape** /proc/net/tcp and /proc/net/tcp6 via ProcfsScraper to catch connections missed by events +2. **Fetch** normalized connection and endpoint state from ConnTracker +3. **Compute delta** against previous state (with or without afterglow) +4. **Build** NetworkConnectionInfoMessage protobuf +5. **Send** over the gRPC stream to Sensor + +Active connections are rate-limited per container to prevent flooding. Close events are never rate-limited to avoid orphaned connections in Sensor. + +**Key files:** NetworkStatusNotifier.cpp (main loop, delta computation, message building) + +## Procfs Fallback + +ProcfsScraper (`ProcfsScraper.cpp`) reads /proc to discover connections that existed before collector started or were missed by event processing. It walks /proc looking for containerized processes, reads their network namespaces, parses /proc/PID/net/tcp, and joins socket inodes with file descriptors to build Connection objects. + +Listening sockets include an originator process when available. Process objects use lazy resolution — they register a callback with SystemInspector and block up to 30s for process metadata. + +**Key files:** ProcfsScraper.cpp (scraping logic), Process.h (lazy resolution) + +## gRPC Output + +NetworkConnectionInfoServiceComm (`NetworkConnectionInfoServiceComm.h`) manages the bidirectional gRPC stream to Sensor. It advertises capabilities including "public-ips" and "network-graph-external-srcs" via metadata headers. + +The stream carries both outbound NetworkConnectionInfoMessages and inbound control messages (IP lists, network configs) from Sensor. + +## Debug API + +NetworkStatusInspector exposes HTTP endpoints at `/state/network/connection` and `/state/network/endpoint` for debugging connection state. Results can be filtered by container and optionally normalized. diff --git a/docs/lib/process.md b/docs/lib/process.md new file mode 100644 index 0000000000..b4ebc72c3d --- /dev/null +++ b/docs/lib/process.md @@ -0,0 +1,68 @@ +# Process Monitoring + +Collector captures process exec events from the kernel and sends them to Sensor as ProcessSignal protobufs. The flow is: kernel exec event → ProcessSignalHandler → ProcessSignalFormatter → gRPC → Sensor. + +## Architecture + +``` +CO-RE BPF probes (execve syscall) + │ + ▼ +ProcessSignalHandler ← receives exec events from SystemInspector + │ + ▼ +ProcessSignalFormatter ← extracts process metadata, builds protobuf + │ + ▼ +SignalServiceClient → Sensor ← ProcessSignal messages +``` + +## Event Reception + +ProcessSignalHandler (`ProcessSignalHandler.h`) is the abstraction boundary with the kernel instrumentation layer. It monitors a single syscall: `execve`. Each exec event triggers signal extraction and transmission to Sensor. + +For each event, the handler: +1. Calls ProcessSignalFormatter to extract process details and build a protobuf +2. Applies rate limiting — a key is computed from container ID, process name, args (truncated to 256 bytes), and exec path. Repeated identical execs are suppressed. +3. Sends accepted signals to Sensor via SignalServiceClient + +The handler also supports `HandleExistingProcess` for processing threadinfo structures from the system inspector, used during initial process discovery at startup. + +**Key files:** ProcessSignalHandler.cpp, ProcessSignalHandler.h + +## Signal Formatting + +ProcessSignalFormatter (`ProcessSignalFormatter.h`) converts kernel events into ProcessSignal protobufs. It handles two input types: +- **sinsp_evt** — live exec events from the BPF probe stream +- **sinsp_threadinfo** — process snapshots from discovery/scraping + +For each signal, the formatter extracts: + +| Field | Source | Notes | +|-------|--------|-------| +| name | comm or exepath | Falls back to exepath if comm is unavailable | +| exec_file_path | exepath or comm | Reverse fallback from name | +| args | proc_args | Sanitized for UTF-8 validity, optional via config | +| pid, uid, gid | Event or threadinfo | Direct extraction | +| container_id | Event extractor | Maps process to container | +| timestamp | evt->get_ts() or clone_ts | clone_ts for discovered processes | +| lineage | Parent chain walk | Up to 10 ancestors, collapsed duplicates | +| scraped | Boolean | True for discovered (non-exec) processes | + +**Process lineage** walks the parent chain using sinsp's `traverse_parent_state`. The traversal stops at container boundaries (checking vpid and container_id) to prevent host process info from leaking into container signals. Consecutive parents with identical exec paths are collapsed, and the chain is capped at 10 entries. + +**Key files:** ProcessSignalFormatter.cpp, ProcessSignalFormatter.h + +## Fallback Discovery + +ProcfsScraper (`ProcfsScraper.h`) provides process discovery by reading /proc when event-based tracking needs supplementation. It reads process metadata from /proc/PID directories, determines container membership from /proc/PID/cgroup, and creates ProcessInfo structures. + +This ensures processes that existed before collector started are still reported to Sensor. Discovered signals are marked with `scraped=true` to distinguish them from live exec events. + +**Key files:** ProcfsScraper.cpp (scraping), Process.h (lazy resolution with 30s timeout) + +## Rate Limiting + +Process signals are rate-limited per unique key (container + name + args + path) to prevent flooding Sensor with repetitive process starts. The RateLimitCache tracks recently sent signals and suppresses duplicates within the rate window. + +Rate-limited events are counted via `nProcessRateLimitCount` for observability. diff --git a/docs/lib/system.md b/docs/lib/system.md new file mode 100644 index 0000000000..b6ea092094 --- /dev/null +++ b/docs/lib/system.md @@ -0,0 +1,129 @@ +# System Inspection Layer + +SystemInspector is the critical abstraction boundary between collector's userspace logic and CO-RE BPF kernel instrumentation. It provides event streaming, statistics, and lifecycle management while hiding all kernel interaction details. + +## Abstraction Boundary + +SystemInspector (defined in system-inspector/SystemInspector.h, not shown in full) encapsulates libsinsp and the BPF probe. Collector components interact solely through SystemInspector's interface: no direct access to sinsp objects, BPF maps, or kernel data structures. This boundary enforces separation between event processing (above the line) and kernel instrumentation (below the line). + +The interface exposes: event iteration via Run, statistics via GetStats, lifecycle control via Start/CleanUp/InitKernel, and handler registration via AddSignalHandler. All kernel complexity--CO-RE relocations, BPF program loading, ringbuffer management, threadinfo caching--remains hidden. + +## Kernel Driver Management + +### Driver Selection + +KernelDriver.h defines IKernelDriver interface with Setup(config, inspector). KernelDriverCOREEBPF implements this for CO-RE BPF. Setup converts config.Syscalls() text names to ppm_sc_code syscall codes via EventNames::GetEventIDs and EventNames::GetEventSyscallID. + +GetSyscallList builds an unordered_set from the configured syscalls. For each syscall string (e.g., "connect"), it looks up enter and exit event codes, retrieves the syscall ID from g_syscall_table (external libscap array), and inserts the ppm_sc_code. PPM_SC_SCHED_PROCESS_EXIT and PPM_SC_SCHED_SWITCH are added unconditionally--procexit controls threadinfo cache size, sched_switch improves process tracking reliability. + +Setup calls inspector.open_modern_bpf(buffer_size, cpu_per_buffer, true, ppm_sc). The true parameter enables online snapshots. buffer_size comes from config.GetSinspBufferSize(), which may adjust based on total_buffer_size and CPU count. cpu_per_buffer determines how many CPUs share a ringbuffer (e.g., 2 means one buffer per 2 CPUs). + +open_modern_bpf loads the CO-RE BPF object, attaches tracepoints, allocates ringbuffers, and starts event capture. Failures throw sinsp_exception, propagating to InitKernel's caller. + +### Event Names + +EventNames singleton maps between string names, event IDs (ppm_event_code), and syscall IDs. The constructor (EventNames.cpp:13) iterates g_event_info[PPM_EVENT_MAX] from libscap, populating names_by_id_ and events_by_name_. Each event has enter and exit variants; events_by_name_["connect"] includes both PPME_SOCKET_CONNECT_E and PPME_SOCKET_CONNECT_X. + +Event names support directionality: "connect>" maps only to enter, "connect<" only to exit. This allows config.Syscalls() to specify directional filtering if needed, though current usage includes both directions. + +syscall_by_id_ maps ppm_event_code to g_syscall_table index. This enables GetEventSyscallID to convert event IDs (which the kernel uses) to ppm_sc_code (which Setup needs). The indirection handles events that don't correspond to syscalls (e.g., procexit). + +## Event Processing + +### Event Extraction + +EventExtractor wraps sinsp_evt accessors. Init(inspector) stores a sinsp pointer. Methods like get_k8s_namespace read labels from the event's associated container. The extractor hides sinsp_evt internal structure, providing a stable API for event attribute access. + +Event iteration happens in system_inspector::Service::Run (not shown). It calls inspector_.next(&evt) in a loop, checks evt type, and dispatches to registered signal handlers. Each handler processes events of interest, extracting data via EventExtractor. + +### Event Map + +EventMap (EventMap.h:12) is a template providing per-event-type storage. It wraps std::array, allowing indexed access by ppm_event_code. The Set(name, value) method uses EventNames::GetEventIDs to find all event codes matching the name and sets their values. + +This supports configuring behavior per event type. For example, EventMap could enable/disable processing for specific syscalls. The template accepts any type T, including function pointers or strategy objects. + +Construction accepts initializer_list> for declarative initialization: EventMap map{{"connect", 42}, {"accept", 99}}. This sets both enter and exit events for each syscall. + +### Host Heuristics + +HostHeuristics.cpp:105 applies platform-specific configuration adjustments. ProcessHostHeuristics creates HostConfig and applies heuristics in sequence: CollectionHeuristic, DockerDesktopHeuristic, PowerHeuristic, CPUHeuristic. + +CollectionHeuristic validates prerequisites: HasEBPFSupport fails fatally if the kernel is too old (except RHEL 7.6). For CORE_BPF, it checks HasBTFSymbols (vmlinux or BTF sysfs), HasBPFRingBufferSupport (BPF_MAP_TYPE_RINGBUF), and HasBPFTracingSupport (BPF_PROG_TYPE_TRACING). Missing any requirement is fatal. For EBPF mode, it checks if CORE_BPF is available and logs a recommendation. + +DockerDesktopHeuristic detects Docker Desktop via HostInfo::IsDockerDesktop and fails--Docker Desktop doesn't support eBPF. PowerHeuristic checks ppc64le machines for RHEL 8.6 kernel <4.18.0-477, which lacks CORE_BPF. CPUHeuristic queries NumPossibleCPU and stores it in HostConfig for buffer calculations. + +Heuristics run during CollectorConfig::InitCollectorConfig:279, after static config is loaded but before kernel initialization. This allows failing fast with clear error messages rather than obscure kernel errors. + +## Host Information + +### Kernel Version + +HostInfo::GetKernelVersion checks the KERNEL_VERSION environment variable, then calls uname(2). KernelVersion::FromHost parses the release string with regex: `^(\d+)\.(\d+)\.(\d+)(-(\d+))?.*`. This extracts kernel (4), major (18), minor (0), and optional build_id (372 for "4.18.0-372.el8.x86_64"). + +HasEBPFSupport returns true for kernel >=4.14. Special case: RHEL 7.6 (3.10.0-957+) backports eBPF, detected via isRHEL76 which checks os_id=="rhel" or "centos", ".el7." substring, and build_id >= MIN_RHEL_BUILD_ID (957). + +HasSecureBootParam checks kernel >=4.11, when boot_params gained the secure_boot field (commit de8cb458625c). This determines whether to read UEFI variables or boot_params. + +### BTF Detection + +HasBTFSymbols (HostInfo.cpp:206) searches paths from libbpf: /sys/kernel/btf/vmlinux, /boot/vmlinux-{release}, /lib/modules/{release}/vmlinux-{release}, /usr/lib/debug paths. snprintf formats the release into the path template. Paths with .mounted=true call GetHostPath to handle containerized execution. + +faccessat with AT_EACCESS checks read permission. ENOTDIR or ENOENT means the file doesn't exist; other errors log warnings. First accessible file returns true. This search order prioritizes kernel-provided BTF (sysfs) over vmlinux files, which may be compressed or debug-only. + +### BPF Capability Probing + +HasBPFRingBufferSupport calls libbpf_probe_bpf_map_type(BPF_MAP_TYPE_RINGBUF, NULL). This issues a bpf() syscall with BPF_MAP_CREATE to check if the kernel supports the map type. Return 0 means unsupported, <0 means probe failed (assume supported), >0 means supported. + +HasBPFTracingSupport probes BPF_PROG_TYPE_TRACING similarly. These probes execute quickly (single syscall) and avoid loading actual BPF programs. They run during heuristic evaluation, before attempting to load the CO-RE probe. + +### Secure Boot + +GetSecureBootStatus caches results in secure_boot_status_. For kernels >=4.11, GetSecureBootFromParams reads /sys/kernel/boot_params/data at SECURE_BOOT_OFFSET (0x1EC). The byte value maps to enum: 0=unset, 1=not determined, 2=disabled, 3=enabled. + +For older kernels, GetSecureBootFromVars scans /sys/firmware/efi/efivars for "SecureBoot-*" files. EFI variables have 4 bytes of attributes plus the value. Reading 5 bytes gets both; byte 4 is the status (0 or 1). The UEFI spec defines this format. + +IsUEFI checks for /sys/firmware/efi directory. If it exists, the system booted via UEFI; otherwise, legacy BIOS. This informs whether to attempt SecureBoot detection. + +### CPU Topology + +NumPossibleCPU wraps libbpf_num_possible_cpus, which reads /sys/devices/system/cpu/possible (e.g., "0-127"). This counts CPUs that can be onlined, including offline cores. The count determines ringbuffer allocation: n_buffers = ceil(num_cpus / cpu_per_buffer). + +### OS Detection + +GetOSReleaseValue reads /etc/os-release or /usr/lib/os-release (freedesktop standard). filterForKey parses KEY="VALUE" lines, stripping quotes. GetDistro reads PRETTY_NAME ("Ubuntu 22.04.1 LTS"). GetOSID reads ID ("ubuntu"). GetBuildID reads BUILD_ID (CentOS/RHEL version). + +IsDockerDesktop checks PRETTY_NAME == "Docker Desktop". IsUbuntu checks ID == "ubuntu". IsRHEL76 checks ID in {rhel, centos}, ".el7." substring, and build_id >=957. IsMinikube checks hostname == "minikube". GetMinikubeVersion parses /etc/VERSION with regex `v\d+\.\d+\.\d+`. + +## Statistics + +### Stats Structure + +system_inspector::Stats (referenced in GetStatus.cpp:18) aggregates metrics: nEvents (total kernel events), nDrops (lost events), nDropsBuffer (ringbuffer full), nDropsThreadCache (cache eviction), nPreemptions (userspace preempted during event read), nThreadCacheSize (current threadinfo count). + +nUserspaceEvents[PPM_EVENT_MAX] counts events by type after kernel. event_parse_micros[PPM_EVENT_MAX] measures time parsing each event from binary. event_process_micros[PPM_EVENT_MAX] measures handler processing time. These arrays enable per-syscall performance analysis. + +nProcessSent, nProcessSendFailures, nProcessResolutionFailuresByEvt, nProcessResolutionFailuresByTinfo track process signal generation. nGRPCSendFailures counts stream write errors. These counters flow to CollectorStatsExporter for Prometheus metrics. + +### Event Timing + +When enable_detailed_metrics_ is true, CollectorStatsExporter creates rox_collector_event_times_us_total and _avg metrics for each enabled syscall. Labels distinguish event_type (connect, accept) and event_dir (>, <). This granularity identifies slow syscalls or handler bottlenecks. + +Parse time includes reading from ringbuffer and deserializing ppm_event structures. Process time includes running handlers and updating internal state. High parse time suggests ringbuffer contention or large event payloads. High process time suggests handler inefficiency. + +## Lifecycle + +InitKernel selects KernelDriverCOREEBPF, calls Setup, and logs driver success to StartupDiagnostics. Start launches the event loop thread. Run polls events and dispatches to handlers until control flag transitions to STOP_COLLECTOR. CleanUp closes the inspector, which unmaps ringbuffers and detaches BPF programs. + +The destructor order matters: handlers must finish before sinsp closes. system_inspector_ lives in CollectorService, destroyed after networking and exporter threads stop. This ensures no handler callbacks fire after subsystems are torn down. + +AddSignalHandler registers components that process specific event types. NetworkSignalHandler handles connect, accept, close for network tracking. ProcessSignalHandler (not shown) handles execve, procexit for process lineage. Handlers implement a common interface and receive events via callbacks. + +## Integration Points + +GetStats provides snapshot of kernel metrics for /ready health checks and Prometheus scraping. The method doesn't block event processing; it reads atomic counters or locked structures briefly. + +GetContainerMetadataInspector returns a shared_ptr to ContainerMetadata, allowing HTTP handlers to query container state. The metadata lives as long as system_inspector_, tying its lifecycle to sinsp. + +GetUserspaceStats (referenced in NetworkSignalHandler) exposes userspace-specific counters not available from libsinsp. This allows NetworkSignalHandler to track network-specific failures separately from generic event stats. + +The inspector thread owns the event loop. Other threads interact via thread-safe methods: GetStats uses atomics, AddSignalHandler mutates before Start, Run blocks the calling thread. This single-threaded event model simplifies handler implementation--no locks needed within handler callbacks.