Skip to content

Collector modernization proposal #3502

Description

@robbycochran

Problem

Collector is a critical C++ eBPF security agent deployed on every stackrox-monitored node. It works, but two things make it expensive to evolve:

  1. A vendored fork of falcosecurity-libs (753K lines, upgrades are often stalled because sinsp internals are coupled into ~27 files.
  2. No automated dependency version tracking or CVE visibility for 19 source-compiled C++ libraries.

Assumptions

  • The C++ is solid. Collector's own code (15K lines) is modern C++17 with clean abstractions
  • Falco updates are a primary pain. The coupling between collector and sinsp internals makes upgrades feel risky and expensive.
  • We want shared eBPF infrastructure. fact (Rust, Aya, LSM hooks) and collector monitor the same nodes. A shared C BPF process/network probe stack, validated under collector first as a RuntimeAgent implementation, could eventually serve both projects.
  • Security hardening is needed Drop-privileges and self-hardening are needed.
  • Changes stay inside collector. The gRPC contract with Sensor and the two-stream model (ProcessSignal + NetworkConnectionInfo) are unchanged. this isolation means collector can iterate and validate independently without cross-repo coordination.

Feature 1: Dependency Management and Security Posture

Modernize how collector manages C++ dependencies, close the CVE visibility gap, and complete the self-hardening initiative.

Epic: Reduce privileges and drop capabilities

  • Launch using minimally needed capabilities.
  • Capability drop after BPF loading (CAP_BPF, CAP_PERFMON, CAP_SYS_RESOURCE)
  • Openshift specific SCC and SELinux configuration

Epic: Dependency Lifecycle

  • Migrate submodule-compiled dependencies to vcpkg where feasible (yaml-cpp, googletest, civetweb, prometheus-cpp, gperftools, jsoncpp, valijson, uthash are candidates). vcpkg provides version pinning, baseline tracking, and a path to automated version bumps via Renovate's native vcpkg.json support.
  • Generate CycloneDX SBOM covering all source-compiled deps for CVE scanning
  • Automated CVE notification via osv-scanner in CI

Feature 2: RuntimeAgent Abstraction

Create a clean internal interface that decouples collector's event processing from falcosecurity-libs. This makes falco upgrades more mechanical and opens a substitution point for alternative backends.

Phase 1: Concentrate Falco Surface

Move all sinsp/scap/ppm includes and type usage from 27 files into 3-4 files within system-inspector/. Minimal to no behavior change intended. Today, sinsp types leak through public headers across the codebase. After this phase, only system-inspector/ internals know about falco.

Phase 2: Define and Implement RuntimeAgent

Introduce a typed event stream interface. One callback, events tagged by type (process exec, network connect, endpoint listen, credential change -- extensible). The runtime produces events; CollectorService routes them to the existing gRPC streams unchanged.

Wrap existing falco code behind this interface using two internal layers:

graph TD
    CS[CollectorService] -->|subscribes| RA[RuntimeAgent]

    subgraph SinspRuntime["SinspRuntime (implements RuntimeAgent)"]
        INS[SinspInspector — event loop, filtering, dispatch]
        FD[FalcoDriver — thin sinsp wrapper]
        INS --> FD
    end

    RA --> INS
    FD --> FALCO[falcosecurity-libs]

    RA -.->|"future"| MBR[ModernBpfRuntime — libbpf, LSM hooks]
Loading

SinspInspector contains collector's own logic: the event loop, event filtering, signal dispatch. This code survives any backend swap.

FalcoDriver is ~15 methods translating sinsp types to collector-owned types. The only file that #includes sinsp headers. Falco upgrades change this file only.

Phase 3: Upgrade Falco

With RuntimeAgent in place, upgrade the fork from 0.18 to 0.25.


What This Enables

The RuntimeAgent interface is the strategic asset. It enables three future paths without committing to any:

  • Custom C BPF probes (libbpf + LSM/TCP hooks) as a ModernBpfRuntime, validated under collector's existing integration test suite, potentially shared with fact as a common process/network probe stack
  • fact convergence where fact's Rust runtime implements RuntimeAgent, enabling a single node agent for process + file + network monitoring
  • Easier falco upgrades where a version bump changes FalcoDriver instead of 27 files across every subsystem

Because all changes stay inside collector with no Sensor contract modifications, each phase can be validated independently through the existing integration test suite and shipped incrementally.

What's Not in Scope

  • Changing the gRPC contract with Sensor
  • Replacing falco with custom probes (enabled, not committed)
  • Merging collector and fact binaries (enabled, not committed)
  • Language migration to Rust (enabled, not committed)
  • Network policy or OVN monitoring
  • Config surface cleanup (can happen independently)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions