Deployment-aware workflow orchestration for self-hosted Kubernetes environments.
Platformatic World solves the version-pinning problem for Workflow DevKit: when new code deploys, in-flight workflow runs must continue executing on the code version that started them. The Vercel world handles this via Vercel's infrastructure. Platformatic World provides the same guarantees for self-hosted environments by routing queue messages through a central service that pins each run to its originating deployment version.
graph LR
PodV1["Pod v1"]
PodV2["Pod v2"]
ICC["ICC"]
WF["Workflow Service
Fastify"]
PG[("PostgreSQL")]
PodV1 <-->|"HTTP/REST"| WF
PodV2 <-->|"HTTP/REST"| WF
ICC -->|"Admin API"| WF
WF -->|"SQL"| PG
style WF fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a5f
style PG fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a5f
style PodV1 fill:#d1fae5,stroke:#16a34a,stroke-width:2px,color:#14532d
style PodV2 fill:#d1fae5,stroke:#16a34a,stroke-width:2px,color:#14532d
style ICC fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f
@platformatic/workflow(packages/workflow/) -- Fastify REST API that owns storage, queue routing, and deployment lifecycle. Multi-tenant with per-app isolation.@platformatic/world(packages/world/) -- Thin HTTP client implementing the@workflow/worldWorldinterface. Drop-in replacement for other world implementations.@platformatic/workflow-fastify(packages/workflow-fastify/) -- Optional Fastify plugin that mounts standalone-build handlers and registers the queue handler on boot. For non-Next.js hosts.
@platformatic/world and the Workflow Service run in two distinct modes.
The service auto-detects which one based on the presence of a Kubernetes
service-account token. Apps just point PLT_WORLD_SERVICE_URL at the
service URL and use the same SDK in both modes.
| Aspect | Local mode (single-tenant) | Kubernetes mode (with ICC) |
|---|---|---|
| Triggered by | No K8s service-account token detected | K8s service-account token present at runtime |
| Authentication | None | K8s TokenReview per request |
| Apps | One implicit app (default) auto-provisioned |
One app per K8s ServiceAccount binding, provisioned by ICC |
| Pod-to-handler registration | App calls world.start() on boot |
ICC registers handlers via the admin API; world.start() is a no-op |
| Deployment version | Defaults to local (or PLT_WORLD_DEPLOYMENT_VERSION) |
Auto-detected from the pod's plt.dev/version label |
| Admin API | Open (no auth) | Restricted to the configured admin ServiceAccount (e.g. platformatic:icc) |
| Run-pinning across deploys | Yes (every run records the version that started it) | Yes (same mechanism; ICC drives version lifecycle) |
Local mode is what you use for development, CI, and the e2e tests in this repo. It runs the same code paths as production -- only the auth and handler-registration entry points differ.
Kubernetes mode is the production deployment under ICC. ICC is the control plane: it provisions apps, binds K8s ServiceAccounts to apps, registers pod handler endpoints, and drives version lifecycle (drain / expire). The service itself is identical between the two modes.
The diagram at the top shows the K8s-with-ICC mode. In local mode, replace the ICC box with nothing -- the service runs standalone against PostgreSQL and accepts unauthenticated traffic from apps on the same machine.
- Node.js >= 22.19.0
- PostgreSQL 17
- pnpm >= 10
The fastest path: run the service via npx, point a Next.js app at it.
# Start PostgreSQL
docker run -d --name workflow-pg \
-e POSTGRES_USER=wf -e POSTGRES_PASSWORD=wf -e POSTGRES_DB=workflow \
-p 5434:5432 postgres:17-alpine
# Start the workflow service
npx @platformatic/workflow postgresql://wf:wf@localhost:5434/workflowgit clone https://github.com/platformatic/platformatic-world.git
cd platformatic-world
pnpm install
docker compose up -d # PostgreSQL on port 5434
cd packages/workflow
npx wattpm start # single-tenant mode (no auth)The service starts on http://localhost:3042. Migrations run automatically
on first start. Single-tenant mode is detected automatically because no K8s
service-account token is present.
In local mode the app must call world.start() once on boot to register
its callback endpoints with the service. (In K8s + ICC, ICC handles this
and world.start() becomes a no-op.)
For Next.js, create instrumentation.ts in your project root:
// instrumentation.ts
export async function register () {
if (process.env.PLT_WORLD_SERVICE_URL) {
const { createWorld } = await import('@platformatic/world')
const world = createWorld()
await world.start?.()
}
}Then start your app:
WORKFLOW_TARGET_WORLD=@platformatic/world \
PLT_WORLD_SERVICE_URL=http://localhost:3042 \
PORT=3000 \
npx next start -p 3000For other frameworks (Express, Koa, Fastify, plain Node), call
world.start() from your server bootstrap. For Fastify specifically,
@platformatic/workflow-fastify does this
plus mounts the standalone-build handlers.
For low-level scripting or tests, skip the SDK and drive the world client directly:
import { createPlatformaticWorld } from '@platformatic/world'
const world = createPlatformaticWorld({
serviceUrl: 'http://localhost:3042',
appId: 'default',
deploymentVersion: 'v1',
})
// Create a workflow run
const { run } = await world.events.create(null, {
eventType: 'run_created',
eventData: {
workflowName: 'my-workflow',
deploymentId: 'v1',
input: { key: 'value' },
},
})
// Queue a message (routed to the correct deployment version)
await world.queue('__wkf_workflow_my-workflow', { runId: run.runId })
await world.close()In production the Workflow Service runs as a Kubernetes Deployment, fronted by ICC (the Platformatic control plane). The setup differs from local mode in three ways:
- Service detects K8s automatically. When the pod has a projected
service-account token at
/var/run/secrets/kubernetes.io/serviceaccount/token, the service starts in multi-tenant mode. No flag is needed. - ICC provisions and binds apps. Apps are created via
POST /api/v1/appsand bound to a K8s ServiceAccount viaPOST /api/v1/apps/:appId/k8s-binding. Pods then authenticate with their projected token; the service validates it via the K8sTokenReviewAPI and maps it to the bound app. - ICC registers pod handlers and drives the version lifecycle. Pods
do not call
world.start()themselves -- ICC posts toPOST /api/v1/apps/:appId/handlerswhen a pod is ready, and drains / expires versions via the admin API as deploys roll out.
Required environment for the service in K8s mode:
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string |
K8S_ADMIN_SERVICE_ACCOUNT |
The ICC ServiceAccount, e.g. platformatic:icc |
K8S_API_SERVER |
Defaults to https://kubernetes.default.svc |
K8S_CA_CERT |
Defaults to the in-pod CA path |
App pods only need:
WORKFLOW_TARGET_WORLD=@platformatic/world
PLT_WORLD_SERVICE_URL=http://workflow-service.<namespace>.svc.cluster.local:3042
PLT_WORLD_APP_ID is resolved from the pod's ServiceAccount binding;
PLT_WORLD_DEPLOYMENT_VERSION is read from the pod's plt.dev/version
label. Both can be overridden explicitly if needed.
Production deployment is owned by ICC (see the ICC repo and the Deployment Lifecycle section below for the contract between the service and ICC). This repo does not ship K8s manifests.
| Environment Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://wf:wf@localhost:5434/workflow |
PostgreSQL connection string |
PORT |
3042 |
HTTP listen port |
HOST |
0.0.0.0 |
HTTP listen host |
K8S_API_SERVER |
https://kubernetes.default.svc |
Kubernetes API server URL (multi-tenant only) |
K8S_CA_CERT |
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
Path to K8s CA certificate (multi-tenant only) |
K8S_ADMIN_SERVICE_ACCOUNT |
K8s service account with admin access, format namespace:name (e.g. platformatic:icc) |
interface PlatformaticWorldConfig {
serviceUrl: string // Workflow Service base URL
appId: string // Application ID
deploymentVersion: string // Current deployment version
}The service auto-detects its operating mode. If a K8s service account token is present, it starts in multi-tenant mode with authentication. Otherwise, it starts in single-tenant mode with no auth.
No authentication. A single implicit application is auto-provisioned. Just set PLT_WORLD_SERVICE_URL and go.
Pods authenticate with their projected K8s ServiceAccount tokens. The service validates them via the K8s TokenReview API and maps the ServiceAccount identity to an application.
Admin endpoints (app provisioning, draining, version management) require a K8s identity configured as the admin service account via K8S_ADMIN_SERVICE_ACCOUNT (e.g. platformatic:icc).
Every authenticated request resolves to an application_id. All SQL queries include WHERE application_id = $appId, enforcing row-level isolation between tenants.
All app-scoped endpoints are prefixed with /api/v1/apps/:appId.
| Method | Path | Description |
|---|---|---|
POST |
/runs/:runId/events |
Create an event (main write path) |
GET |
/runs/:runId/events |
List events for a run |
GET |
/events/by-correlation |
List events by correlation ID |
Supported event types: run_created, run_started, run_completed, run_failed, run_cancelled, run_expired, step_created, step_started, step_completed, step_failed, step_retrying, hook_created, hook_received, hook_disposed, wait_created, wait_completed.
| Method | Path | Description |
|---|---|---|
GET |
/runs/:runId |
Get run by ID |
GET |
/runs |
List runs (filters: status, workflowName, deploymentId) |
POST |
/runs/:runId/replay |
Replay a completed run (creates new run with same input, targets original version) |
POST |
/runs/:runId/cancel |
Cancel a running run |
POST |
/runs/:runId/wake-up |
Cancel active sleeps for a run |
GET |
/workflows/:workflowName/template |
Get step template from most recent completed run (query: deploymentId) |
| Method | Path | Description |
|---|---|---|
GET |
/runs/:runId/steps/:stepId |
Get step by ID |
GET |
/runs/:runId/steps |
List steps for a run |
| Method | Path | Description |
|---|---|---|
GET |
/hooks/:hookId |
Get hook by ID |
GET |
/hooks/by-token/:token |
Get hook by token |
GET |
/hooks |
List hooks (filter: runId) |
| Method | Path | Description |
|---|---|---|
PUT |
/runs/:runId/streams/:name |
Write chunk(s) to a stream |
GET |
/streams/:name |
Read stream chunks |
GET |
/runs/:runId/streams |
List stream names for a run |
| Method | Path | Description |
|---|---|---|
POST |
/queue |
Enqueue a message |
Supports delaySeconds for deferred delivery and idempotencyKey for deduplication.
| Method | Path | Description |
|---|---|---|
POST |
/handlers |
Register a pod's queue handler endpoints |
DELETE |
/handlers/:podId |
Deregister a pod |
| Method | Path | Description |
|---|---|---|
GET |
/encryption-key |
Get per-run encryption key (HKDF-derived) |
| Method | Path | Description |
|---|---|---|
GET |
/dead-letters |
List dead-lettered messages |
POST |
/dead-letters/:messageId/retry |
Retry a dead-lettered message |
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/apps |
Provision application |
POST |
/api/v1/apps/:appId/k8s-binding |
Create K8s ServiceAccount binding |
DELETE |
/api/v1/apps/:appId/k8s-binding |
Remove K8s binding |
GET |
/api/v1/apps/:appId/versions/:deploymentId/status |
Get version draining status |
POST |
/api/v1/apps/:appId/versions/:deploymentId/expire |
Force-expire a deployment version |
POST |
/api/v1/versions/notify |
Notify version status change |
GET |
/api/v1/apps/:appId/quotas |
Get quotas for an app (returns defaults if none set) |
PUT |
/api/v1/apps/:appId/quotas |
Set/update quotas (maxRuns, maxEventsPerRun, maxQueuePerMinute) |
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/ready |
No | Database connectivity check |
GET |
/status |
No | Service status |
GET |
/metrics |
No | Prometheus metrics |
The queue router pins messages to deployment versions:
- Each message carries a
deployment_versionfrom the run that created it - The router looks up registered handlers for that version
- Messages are dispatched via HTTP POST to the correct pod
- If a version is expired, messages are rejected
Messages with delaySeconds > 0 are stored with status='deferred' and a deliver_at timestamp. A background poller promotes them to pending when due.
Failed dispatches use exponential backoff: min(1000ms * 2^attempt, 60000ms), up to 10 attempts. After max attempts, messages move to dead status.
The poller detects runs stuck in running for over 15 minutes with no queued messages, marking them as failed with an ORPHANED error code.
The service provides APIs for ICC to manage deployment lifecycle:
- Version notification -- ICC notifies the service when a deployment version changes status (
active,draining,expired) - Draining status -- ICC queries the service for authoritative counts of active runs, pending hooks, pending waits, and queued messages for a version
- Force-expire -- ICC can force-expire a version, which cancels all in-flight runs, dead-letters queued messages, and deregisters handlers
This gives ICC a single authoritative source for "are there any non-terminal workflow runs for version X?" -- something that cannot be determined from pod heartbeats or queue depth alone, because hooks and waits are invisible at the infrastructure level.
Per-application quotas (configurable via the admin API or the workflow_app_quotas table):
| Quota | Default | Description |
|---|---|---|
max_runs |
10,000 | Maximum concurrent active runs |
max_events_per_run |
10,000 | Maximum events per run |
max_queue_per_minute |
1,000 | Queue message rate limit per minute |
Exceeding a quota returns HTTP 429.
The /metrics endpoint returns Prometheus-compatible metrics provided by the Platformatic runtime (HTTP request duration, status codes, Node.js runtime stats).
All tests run against the local mode of the Workflow Service (no K8s,
no ICC) backed by PostgreSQL on port 5434 from docker-compose.yml. K8s
mode is exercised in the ICC integration suite, not here.
# Start PostgreSQL (port 5434)
docker compose up -d
# Install dependencies
pnpm install
# Run unit + integration tests across the three packages
pnpm testpnpm test runs:
@platformatic/workflow-- service unit and integration tests (database, auth, queue routing, draining, quotas, etc.)@platformatic/world-- world client unit tests@platformatic/workflow-fastify-- Fastify plugin tests
The repo ships two Next.js workbenches that exercise the full stack --
real Next.js app, real Workflow SDK, real @platformatic/world adapter,
real Workflow Service over HTTP -- pinned to different SDK versions:
| Workbench | SDK version | Purpose |
|---|---|---|
e2e-v5/ |
workflow@5.0.0-beta.12 |
Mirrors Vercel's main-branch CI. Hosts our Vercel-compat suite (61 ports of upstream tests) and the CBOR-specific assertions. |
e2e-v4/ |
workflow@4.5.0 (stable) |
Guards the v4 runtime path so stable-SDK users keep working. |
Both workbenches share the same workflow sources and helper, only the
workflow / @workflow/* versions in package.json differ.
Each test run boots:
- The Workflow Service via
wattpm start(random port per run). - The Next.js app via
next start(random port per run). - Truncates the database between suites for isolation.
Run from the repo root:
# Local-suite e2e against v5 beta SDK
pnpm test:e2e:v5
# Local-suite e2e against v4 stable SDK
pnpm test:e2e:v4
# Vercel-compat suite (v5 workbench, ports of upstream tests)
pnpm test:e2e:vercelOr run a single file directly:
cd e2e-v5
node --test --test-force-exit test/workflow.test.ts
node --test --test-force-exit test/vercel-e2e.test.tsCI runs all three suites (pnpm test:e2e:v4, pnpm test:e2e:v5,
pnpm test:e2e:vercel) on every PR.
workflow.test.ts(both workbenches) -- our own e2e cases: hook resume, retries, replay, cancel, streams, multi-step pipelines.cbor-e2e.test.ts(both workbenches) -- CBOR queue-transport assertions; verifiesUint8Arraysurvives a queue round-trip without base64 wrapping.vercel-e2e.test.ts(v5 only) -- 61 ports of Vercel's community-world upstream tests; proves the Platformatic world is a drop-in for Vercel's own integration matrix.compat.test.ts(v4 only) -- compatibility guards specific to the v4 stable line (older streamer API shape, etc.).
See e2e-v5/README.md for the
WORKFLOW_PUBLIC_MANIFEST=1 build flag and other workbench-specific
notes.
packages/
workflow/
cli.js # CLI entrypoint (npx @platformatic/workflow)
watt.json # Platformatic Service configuration (dist/plugins for production)
watt-dev.json # Dev configuration (./plugins for local development)
lib/
db.ts # pg.Pool + Postgrator migrations
errors.ts # Typed HTTP errors (@fastify/error)
quotas.ts # Quota checks + rate limiting
auth/
index.ts # Auth plugin (onRequest hook)
k8s-token.ts # K8s ServiceAccount token validation
plugins/
db.ts # Database + auth initialization
auth.ts # Auth wiring
apps.ts # App provisioning + K8s bindings
events.ts # Event creation (main write path)
runs.ts # Run queries + workflow template API
run-actions.ts # Replay, cancel, wake-up
steps.ts # Step queries
hooks.ts # Hook queries
streams.ts # Stream read/write
queue.ts # Queue message ingestion
poller.ts # Poller lifecycle management
encryption.ts # Per-run encryption keys
handlers.ts # Pod handler registration
draining.ts # Version draining status + force-expire
versions.ts # Version status notifications
dead-letters.ts # Dead-letter management
quotas.ts # Quota admin API (GET/PUT)
queue/
router.ts # Deployment-aware message routing
dispatcher.ts # HTTP dispatch to pods
poller.ts # Deferred delivery + retry + orphan detection
retry.ts # Exponential backoff
migrations/
001.do.sql # Full schema (auth, core, queue, encryption, quotas)
test/ # 87 tests across 19 suites
world/
src/
index.ts # createPlatformaticWorld() + createWorld() factories
lib/
client.ts # undici Pool HTTP client
storage.ts # Storage interface (runs, events, steps, hooks)
queue.ts # Queue + handler registration
streamer.ts # Stream read/write
encryption.ts # Encryption key fetching
test/ # 12 tests
See PLATFORMATIC-WORLD-DESIGN.md for the full design rationale, including:
- Why all operations go through a central service (hooks and waits are invisible at the infrastructure level)
- Deployment-aware routing semantics
- Upgrade safety guarantees
- Database schema design
See UPGRADE-SEMANTICS.md for the analysis of Workflow DevKit's deterministic replay and why version pinning is required.