RoleBasedGroup (RBG) 🚀

🎯 A Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.

🌐 Official Website: rolebasedgroup.github.io

🏗️ Architecture

📰 Latest News

Date	Release	Highlights
2026-04-22	v0.7.0-alpha.3	`v1alpha2` conversion webhooks, CLI multi-node LLM serving
2026-03-31	v0.7.0-alpha.2	Pod port allocator, CLI foundations
2026-03-18	v0.7.0-alpha.1	`v1alpha2` API, coordinated policies, gang scheduling
2026-02-18	v0.6.0	Coordinated scaling, stateful InstanceSet
2025-12-03	v0.5.0	Native InstanceSet, in-place updates, Mooncake integration
2025-09-23	v0.4.0	RBGS scaling, Volcano podgroup support

🤔 Why RBG?

Traditional Kubernetes primitives (StatefulSets / Deployments) struggle with LLM inference services that:

Challenge	Description
Multi-role topologies	gateway → router → prefill → decode
Performance-sensitive	GPU/network topology matters
Atomic operations	deploy, upgrade, scale, failover across roles

RBG treats an inference service as a role-based group — a topologized, stateful, coordinated multi-role organism managed as a single unit.

🎯 Key Concepts

Concept	Description
Role	Basic scheduling and rollout unit. Each role (prefill, decode) has its own spec, lifecycle and policies.
RoleBasedGroup	A group of roles forming one logical service (e.g., one LLM inference deployment).
RoleInstance	A collection of Pods with tightly bound lifecycle. Supports in-place updates and controls upgrades/status for the Pod group.
CoordinatedPolicy	A separate CRD for coordinating operations across roles. Controls `maxSkew` and `progression` during rolling updates and scaling.

✨ Key Features — SCOPE

Capability	Description
Stable	Topology-aware deterministic operations with unique RoleID injection
Coordination	Cross-role policy engine: deployment pairing, coordinated upgrades, linked recovery
Orchestration	Role dependencies, precise startup sequences, topology self-aware service discovery
Performance	Hardware affinity scheduling: GPU-NVLink → PCIe → RDMA → VPC
Extensible	Declarative APIs and plugin mechanisms for future architectures

🚀 Getting Started

📦 Installation

helm install rbg-controller oci://registry-1.docker.io/sglproject/rbg-controller-chart --version v0.7.0-alpha.3

For detailed instructions, see Installation Guide.

🎮 Quick Start

Deploy a basic RoleBasedGroup with two roles and startup dependencies:

apiVersion: workloads.x-k8s.io/v1alpha2
kind: RoleBasedGroup
metadata:
  name: nginx-cluster
spec:
  roles:
    - name: frontend
      replicas: 1
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 80

    - name: backend
      replicas: 3
      dependencies: ["frontend"]  # backend starts after frontend is ready
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 8080

Deployment Patterns

Pattern	Used For	Description
standalonePattern	Single-node deployment	Single pod per instance
leaderWorkerPattern	Multi-node distributed deployment	Leader + workers for tensor parallelism

RoleTemplates

Reduce configuration duplication with reusable templates:

spec:
  roleTemplates:
    - name: base-template
      template:
        spec:
          containers:
            - name: nginx
              image: nginx:1.14.1

  roles:
    - name: frontend
      replicas: 2
      standalonePattern:
        templateRef:
          name: base-template

    - name: backend
      replicas: 3
      standalonePattern:
        templateRef:
          name: base-template
          patch:  # role-specific overrides
            spec:
              containers:
                - name: nginx
                  resources:
                    requests:
                      memory: "128Mi"

🖥️ CLI Tool

kubectl-rbg is a CLI tool for managing RBG resources and LLM deployments.

Installation

# Build from source
make build-cli
chmod +x bin/kubectl-rbg
sudo mv bin/kubectl-rbg /usr/local/bin/

LLM Quick Start

# Initialize configuration
kubectl rbg llm config init

# Pull a model
kubectl rbg llm model pull Qwen/Qwen3.5-0.8B

# Deploy as inference service
kubectl rbg llm svc run my-qwen Qwen/Qwen3.5-0.8B

# Chat with the service
kubectl rbg llm svc chat my-qwen

For detailed CLI documentation, see kubectl-rbg.

🧠 Inference Examples

Prefill/Decode Disaggregated

SGLang PD-disaggregated examples in examples/inference/:

Example	Pattern	Description
pd-disagg-standalone.yaml	standalonePattern	Single pod per role, suitable for single-GPU instances
pd-disagg-leader-worker.yaml	leaderWorkerPattern	Multi-GPU tensor parallelism for decode role

Aggregated Inference

SGLang aggregated examples:

Example	Pattern	Description
agg-standalone.yaml	standalonePattern	Single-GPU aggregated inference
agg-leader-worker.yaml	leaderWorkerPattern	Multi-GPU tensor parallelism

🔗 Ecosystem Integration

RBG integrates with ecosystem components for production LLM inference:

NVIDIA Dynamo

NVIDIA Dynamo is an open-source, datacenter-scale inference stack that orchestrates multi-node AI workloads above inference engines like vLLM and SGLang:

Example	Description
dynamo/pd-disagg.yaml	PD-disaggregated with Dynamo SGLang runtime
dynamo/pd-disagg-multi-nodes.yaml	Multi-node PD-disaggregated
dynamo/agg.yaml	Aggregated inference with Dynamo
dynamo/agg-multi-nodes.yaml	Multi-node aggregated

Mooncake

Mooncake is a disaggregated architecture for LLM serving, providing KV cache transfer and reuse across distributed inference:

Example	Description
mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yaml	PD-disaggregated with KV cache reuse
mooncake-store/agg-kvcache-reuse-with-mooncake.yaml	Aggregated with KV cache reuse
mooncake-transfer-engine/sgl-pd-disagg-with-mooncake-te.yaml	SGLang PD-disaggregated with transfer engine
mooncake-transfer-engine/vllm-pd-disagg-with-mooncake-te.yaml	vLLM PD-disaggregated with transfer engine

📂 Examples Directory

🧱 Basic Examples (`examples/basic/`)

Path	Description
`rbg/base.yaml`	Basic RoleBasedGroup with role dependencies
`rbg/dependency/`	Role dependency configurations
`rbg/patterns/`	Deployment patterns: standalone, leader-worker, custom-components
`rbg/scheduling/`	Gang scheduling: Volcano, scheduler-plugins
`rbg/update-strategy/`	Rolling update with partition support
`rbg/restart-policy/`	Restart policy configurations
`rbg/scaling/`	Scaling adapter with HPA integration
`rbg/role-template/`	RoleTemplates for reducing duplication
`coordinated-policy/`	Coordinated rollout and scaling policies
`engine-runtime/`	Engine runtime profile configurations

🧠 Inference Examples (`examples/inference/`)

Path	Description
`agg-standalone.yaml`	Aggregated SGLang (standalone pattern)
`agg-leader-worker.yaml`	Aggregated (leader-worker pattern)
`pd-disagg-standalone.yaml`	Prefill/Decode disaggregated (standalone)
`pd-disagg-leader-worker.yaml`	Prefill/Decode disaggregated (leader-worker)
`ecosystem/`	NATS, etcd, Dynamo, Mooncake integration
`ecosystem/dynamo/`	NVIDIA Dynamo runtime examples
`ecosystem/mooncake/`	Mooncake KV cache transfer engine

📚 Documentation

Source	Link
Official Docs	rolebasedgroup.github.io
Local Docs	doc/TOC.md

Version Compatibility

RBG Version	Kubernetes	LeaderWorkerSet
main / v0.7.0-alpha.x	>=v1.22.x	Not Required
v0.6.0	>=v1.28.x	>=v0.7.0
v0.5.0	>=v1.28.x	>=v0.6.0
v0.4.0	>=v1.28.x	>=v0.7.0

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Verify copyright headers
make copyright-check

# Add missing headers
make copyright-fix

💬 Community

Channel	Link
Slack	#rbg channel
Issues	GitHub Issues
Discussions	Community Discussions

📜 Code of Conduct

This project follows the Kubernetes Code of Conduct.

🙏 Acknowledgment

RBG is inspired by and reuses code from LeaderWorkerSet (LWS).

Name		Name	Last commit message	Last commit date
Latest commit History 376 Commits
.github		.github
api/workloads		api/workloads
client-go		client-go
cmd		cmd
config		config
deploy		deploy
doc		doc
examples		examples
hack		hack
internal/controller/workloads		internal/controller/workloads
keps		keps
pkg		pkg
python/patio		python/patio
test		test
tools		tools
vendor		vendor
version		version
.coveralls.yml		.coveralls.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.yamllint.yml		.yamllint.yml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
OWNERS_ALIASES		OWNERS_ALIASES
PROJECT		PROJECT
README-zh_CN.md		README-zh_CN.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

RoleBasedGroup (RBG) 🚀

🏗️ Architecture

📰 Latest News

🤔 Why RBG?

🎯 Key Concepts

✨ Key Features — SCOPE

🚀 Getting Started

📦 Installation

🎮 Quick Start

Deployment Patterns

RoleTemplates

🖥️ CLI Tool

Installation

LLM Quick Start

🧠 Inference Examples

Prefill/Decode Disaggregated

Aggregated Inference

🔗 Ecosystem Integration

NVIDIA Dynamo

Mooncake

📂 Examples Directory

🧱 Basic Examples (examples/basic/)

🧠 Inference Examples (examples/inference/)

📚 Documentation

Version Compatibility

🤝 Contributing

💬 Community

📜 Code of Conduct

🙏 Acknowledgment

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

🧱 Basic Examples (`examples/basic/`)

🧠 Inference Examples (`examples/inference/`)

Packages