Orleans.GpuBridge.Core

Overview

Orleans.GpuBridge.Core is a .NET 9 library that enables GPU-native distributed computing for Microsoft Orleans. It extends Orleans' actor model with GPU-resident actors that process messages entirely on the GPU, achieving sub-microsecond latencies on supported hardware. The library provides ring kernel infrastructure, temporal alignment (HLC and Vector Clocks), GPU-to-GPU messaging, and hypergraph actor support.

Key Capabilities

Ring Kernels - Persistent GPU dispatch loops that keep actors resident in GPU memory, avoiding repeated kernel launch overhead
Temporal Alignment - Hybrid Logical Clocks and Vector Clocks for distributed causal ordering, maintained on GPU
GPU-to-GPU Messaging - Direct P2P communication between GPUs (NvLink, PCIe, Infinity Fabric) with automatic CPU-routed fallback
Hypergraph Actors - Multi-way relationships with GPU-accelerated pattern matching
Queue-Depth Aware Placement - Adaptive load balancing across heterogeneous GPU resources
Resilience - Polly v8 integration with retry, circuit breaker, and rate limiting for GPU operations
CPU Fallback - All GPU operations have CPU fallback paths for development and graceful degradation
OpenTelemetry Integration - Per-grain GPU memory tracking, metrics, and distributed tracing

Architecture

Orleans.GpuBridge supports two deployment models:

GPU-Offload Model: Actor logic runs on CPU, compute kernels are dispatched to GPU as needed. Best for batch processing and infrequent GPU usage.

GPU-Native Model: Actor state resides permanently in GPU memory. Ring kernels process messages entirely on GPU with zero kernel launch overhead. Requires GPUs with host-native atomic support (A100, H100, Grace Hopper) for persistent mode; partial-coherence GPUs (RTX series) use EventDriven mode.

+-----------------------------------------------------------+
|                   Orleans Application                     |
|  (User Services, Dashboards, Orchestration)               |
+-----------------------------------------------------------+
|     CPU Grains             GPU-Native Actor Ring Kernels   |
|  (Business Logic)  <-->   (Hypergraphs, Analytics)        |
+-----------------------------------------------------------+
|              Orleans.GpuBridge.Grains                     |
|         (GpuBatchGrain, GpuResidentGrain)                 |
+-----------------------------------------------------------+
|              Orleans.GpuBridge.Runtime                    |
|      (KernelCatalog, DeviceBroker, Placement)             |
+----------------------------+------------------------------+
|       DotCompute Backend   |        CPU Backend           |
+----------------------------+------------------------------+

Packages

Package	Description
`Orleans.GpuBridge.Abstractions`	Core interfaces and contracts
`Orleans.GpuBridge.Runtime`	Runtime implementation, placement strategies, temporal infrastructure
`Orleans.GpuBridge.Grains`	GPU-accelerated grain base classes
`Orleans.GpuBridge.Backends.DotCompute`	DotCompute GPU backend (CUDA, Metal, CPU)
`Orleans.GpuBridge.BridgeFX`	High-level pipeline API
`Orleans.GpuBridge.Resilience`	Resilience patterns (Polly v8)
`Orleans.GpuBridge.Diagnostics`	Metrics and OpenTelemetry integration
`Orleans.GpuBridge.HealthChecks`	ASP.NET Core health check integrations
`Orleans.GpuBridge.Generators`	Source generators for GPU actors
`Orleans.GpuBridge.Logging`	Structured delegate-based logging

Getting Started

Installation

dotnet add package Orleans.GpuBridge.Runtime
dotnet add package Orleans.GpuBridge.Grains
dotnet add package Orleans.GpuBridge.Backends.DotCompute

Minimal Configuration

using Orleans.GpuBridge.Runtime.Extensions;

var builder = Host.CreateDefaultBuilder(args)
    .ConfigureServices(services =>
    {
        services.AddGpuBridge(options =>
        {
            options.PreferGpu = true;
            options.FallbackToCpu = true;
            options.MaxConcurrentKernels = 100;
        });

        services.AddRingKernelSupport(options =>
        {
            options.DefaultGridSize = 1;
            options.DefaultBlockSize = 256;
            options.DefaultQueueCapacity = 256;
        });

        services.AddK2KSupport(enableP2P: true);
        services.AddGpuTelemetry();
    })
    .UseOrleans(siloBuilder =>
    {
        siloBuilder.UseLocalhostClustering();
    });

await builder.Build().RunAsync();

Usage Examples

GPU-Accelerated Grain

using Orleans.GpuBridge.Grains.Base;
using Orleans.GpuBridge.Abstractions.Kernels;

public class VectorProcessingGrain : GpuGrainBase<VectorState>
{
    private IGpuKernel<float[], float[]>? _vectorAddKernel;

    public VectorProcessingGrain(IGrainContext grainContext, ILogger<VectorProcessingGrain> logger)
        : base(grainContext, logger) { }

    protected override async Task ConfigureGpuResourcesAsync(CancellationToken ct)
    {
        var kernelFactory = ServiceProvider.GetRequiredService<IKernelFactory>();
        _vectorAddKernel = await kernelFactory.CreateKernelAsync<float[], float[]>("vector-add", ct);
        await _vectorAddKernel.InitializeAsync(ct);
    }

    public async Task<float[]> AddVectorsAsync(float[] a, float[] b)
    {
        return await ExecuteKernelWithFallbackAsync(
            _vectorAddKernel!,
            a,
            cpuFallback: input => Task.FromResult(CpuVectorAdd(input, b)));
    }
}

GPU-Native Actor (Ring Kernel)

using Orleans.GpuBridge.Grains.Base;
using Orleans.GpuBridge.Abstractions.Temporal;

public class TemporalActorGrain : RingKernelGrainBase<ActorState, ActorMessage>
{
    public TemporalActorGrain(IGrainContext grainContext, ILogger<TemporalActorGrain> logger)
        : base(grainContext, logger) { }

    protected override Task<RingKernelConfig> ConfigureRingKernelAsync(CancellationToken ct)
    {
        return Task.FromResult(new RingKernelConfig
        {
            QueueDepth = 256,
            EnableHLC = true,
            EnableVectorClock = false,
            MaxStateSizeBytes = 1024
        });
    }

    protected override void ProcessMessageOnGpu(
        ref ActorState state,
        in ActorMessage message,
        ref HybridTimestamp hlc)
    {
        state.Counter += message.Value;
        state.LastUpdate = hlc.PhysicalTime;
    }

    public async Task SendEventAsync(int value)
    {
        await SendMessageAsync(new ActorMessage { Value = value });
    }
}

GPU Hardware Requirements

GPU Tiers

Tier	Examples	Ring Kernel Mode	Message Latency
Full Coherence	A100, H100, Grace Hopper	Persistent	100-500ns
Partial Coherence	RTX 2000/3000/4000 series	EventDriven	1-10ms (batched)
WSL2 / Limited	Any GPU under WSL2	EventDriven	~5s (development only)

Full coherence GPUs have hostNativeAtomicSupported=1, enabling persistent ring kernels with real-time CPU-GPU memory visibility. Partial coherence GPUs (concurrentManagedAccess=1) must use EventDriven mode where kernels terminate after processing and are relaunched for new batches.

WSL2 Limitations

WSL2's GPU virtualization layer does not support reliable system-scope atomics or CPU-GPU memory coherence. Ring kernels under WSL2 use the EventDriven workaround with Start-Active pattern. All functionality works but at development-only performance. Use native Linux for production GPU-native workloads.

Monitoring and Diagnostics

Health Checks

services.AddHealthChecks()
    .AddGpuBridgeHealthCheck("gpu-health")
    .AddGpuMemoryHealthCheck("gpu-memory", failureThreshold: 0.9f);

OpenTelemetry

services.AddOpenTelemetry()
    .WithMetrics(builder => builder.AddGpuBridgeInstrumentation())
    .WithTracing(builder => builder.AddGpuBridgeInstrumentation());

Available metrics include gpu.grain.allocations, gpu.grain.memory.allocated, gpu.memory.pool.utilization, and gpu.memory.pool.fragmentation.

Building from Source

# Build
dotnet build

# Run all tests
dotnet test

# Run a specific test project
dotnet test tests/Orleans.GpuBridge.Runtime.Tests

# Create NuGet packages
dotnet pack -c Release -o artifacts/packages

Project Status

Component	Status
Core Abstractions	Stable
Runtime (Placement, Temporal, Ring Kernels)	Stable
DotCompute Backend	Stable
Resilience (Polly v8)	Stable
K2K Messaging / P2P	Stable
GPU Memory Telemetry	Stable
Health Checks	Stable
GPUDirect Storage	Planned

Test Suite (v0.3.0)

Project	Passed	Skipped	Total
Abstractions.Tests	242	0	242
Runtime.Tests	255	0	255
Temporal.Tests	290	1	292
Grains.Tests	98	0	98
Generators.Tests	22	0	22
Hardware.Tests	34	3	37
Backends.DotCompute.Tests	56	0	56
RingKernelTests	85	6	92
Performance.Tests	15	5	20
Integration.Tests	32	3	35
Resilience.Tests	53	0	53
Diagnostics.Tests	70	0	70
Total	1,252	18	1,272

Skipped tests require GPU hardware with specific capabilities (hostNativeAtomicSupported), full Orleans silo infrastructure, or are deferred pending lock-free data structure implementation.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Commercial licensing, support, and domain-specific kernel blueprints are available - contact the author for details.

Acknowledgments

Microsoft Orleans - Distributed actor framework
DotCompute - .NET GPU compute abstraction
The .NET Foundation and community

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
.github		.github
BenchmarkDotNet.Artifacts		BenchmarkDotNet.Artifacts
deploy/kubernetes		deploy/kubernetes
docs		docs
examples		examples
monitoring/grafana/dashboards		monitoring/grafana/dashboards
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Directory.Build.props		Directory.Build.props
Dockerfile		Dockerfile
LICENSE		LICENSE
MIGRATION-v1.0.md		MIGRATION-v1.0.md
NOTICE		NOTICE
NuGet.Config		NuGet.Config
Orleans.GpuBridge.sln		Orleans.GpuBridge.sln
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
docker-compose.yml		docker-compose.yml
gpu_timeline.nsys-rep		gpu_timeline.nsys-rep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orleans.GpuBridge.Core

Overview

Key Capabilities

Architecture

Packages

Getting Started

Installation

Minimal Configuration

Usage Examples

GPU-Accelerated Grain

GPU-Native Actor (Ring Kernel)

GPU Hardware Requirements

GPU Tiers

WSL2 Limitations

Monitoring and Diagnostics

Health Checks

OpenTelemetry

Building from Source

Project Status

Test Suite (v0.3.0)

License

Acknowledgments

About

Uh oh!

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Orleans.GpuBridge.Core

Overview

Key Capabilities

Architecture

Packages

Getting Started

Installation

Minimal Configuration

Usage Examples

GPU-Accelerated Grain

GPU-Native Actor (Ring Kernel)

GPU Hardware Requirements

GPU Tiers

WSL2 Limitations

Monitoring and Diagnostics

Health Checks

OpenTelemetry

Building from Source

Project Status

Test Suite (v0.3.0)

License

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages