A distributed, event-driven job execution platform built on Spring Boot 4, Apache Kafka, and PostgreSQL. JobWeaver accepts simulation job requests via a REST API, schedules them with configurable retry logic, dispatches them to a pool of worker threads via Kafka, and tracks execution outcomes across isolated databases.
- Architecture
- Technology Stack
- Project Structure
- Modules
- Key Design Decisions
- API Reference
- Local Setup
- Running Tests
- Documentation
- References
JobWeaver follows a microservices architecture with event-driven communication through Apache Kafka. Each service owns its database and communicates exclusively via Kafka topics, enforcing strict data ownership boundaries.
For the complete architecture breakdown, see:
- Architecture Analysis -- detailed textual description of every component, data model, and design pattern.
- Architecture Diagrams -- Mermaid diagrams covering the system topology, request flow, state machine, thread model, and infrastructure layout.
Client --> API (validate, persist, publish) --> Kafka [job-created]
--> Scheduler (ingest, poll, dispatch) --> Kafka [run-job]
--> Worker (execute, report) --> Kafka [job-completed / job-failed]
--> Scheduler (complete or retry with backoff / dead-letter)
| Component | Technology | Version |
|---|---|---|
| Language | Java | 21 |
| Framework | Spring Boot | 4.0.2 |
| Messaging | Apache Kafka (Confluent Platform) | 7.6.1 |
| Database | PostgreSQL | 16 |
| Migrations | Flyway | Managed by Spring Boot |
| Serialisation | Jackson (JSON / JSONB) | 2.18.3 |
| Build | Maven (multi-module) | 3.x |
| Containers | Docker, Docker Compose | v3.9 |
| JVM Runtime | Eclipse Temurin (Alpine) | 21 |
| Frontend | React, Vite | 19.2, 7.2 |
| Testing | JUnit 5, Mockito, AssertJ | -- |
| Code Generation | Lombok | -- |
jobweaver/
├── docker-compose.yml # Full infrastructure + application stack
├── pom.xml # Parent POM (module declarations, dependency management)
├── ARCHITECTURE.md # Architecture analysis document
├── ARCHITECTURE_DIAGRAM.md # Mermaid architecture diagrams
├── FUTURE_SCOPE.md # Roadmap, limitations, and Phase 3 plans
├── jobweaver-common/ # Shared library (events, exceptions, simulation model)
│ ├── pom.xml
│ └── src/main/java/com/jobweaver/common/
│ ├── events/ # Kafka event records
│ ├── exceptions/ # Base exception classes
│ └── messaging/ # SimulationInstruction, step types, enums
├── jobweaver-api/ # REST API gateway (port 8080)
│ ├── Dockerfile
│ ├── pom.xml
│ └── src/
│ ├── main/java/com/jobweaver/api/
│ │ ├── config/ # JPA auditing config
│ │ ├── controller/ # JobController (REST endpoints)
│ │ ├── dto/ # Request/response records
│ │ ├── entity/ # Job entity (JPA + JSONB)
│ │ ├── exceptions/ # API exception hierarchy + global handler
│ │ ├── filter/ # MDC filter (traceId)
│ │ ├── kafka/ # Producer config, admin, event publisher
│ │ ├── repository/ # JPA repository
│ │ └── service/ # JobService
│ └── main/resources/
│ ├── application.yaml
│ ├── logback-spring.xml
│ └── db/migration/ # Flyway SQL migrations
├── jobweaver-scheduler/ # Scheduling + retry engine (port 8081)
│ ├── Dockerfile
│ ├── pom.xml
│ └── src/main/java/com/jobweaver/jobweaverscheduler/
│ ├── entity/ # JobExecution entity + JobStatus enum
│ ├── exceptions/ # Scheduler exception hierarchy
│ ├── kafka/ # Consumers, producers, admin config
│ ├── repository/ # JobExecutionRepository (SKIP LOCKED)
│ └── service/ # IngestionService, SchedulerService, DispatchScheduler
├── jobweaver-worker/ # Job execution engine (port 8082)
│ ├── Dockerfile
│ ├── pom.xml
│ └── src/main/java/com/jobweaver/worker/
│ ├── config/ # Thread pool configuration
│ ├── entity/ # ExecutionAttempt entity
│ ├── exceptions/ # Worker exception hierarchy
│ ├── kafka/ # Consumer, producer, async offset management
│ ├── repository/ # ExecutionAttemptRepository
│ └── service/ # WorkerService, SimulationExecutor, AttemptProcessor
└── jobweaver-dashboard/ # React frontend (scaffold)
└── my-app/
├── package.json
├── vite.config.js
└── src/
Shared library containing the domain contracts consumed by all three services:
- Event records:
JobCreatedEvent,RunJobEvent,JobCompletedEvent,JobFailedEvent,DeadLetterEvent. - Simulation model: Sealed
SimulationStepinterface with Jackson polymorphic deserialization. Step types:SleepStep,LogStep,ComputeStep,HttpCallStep,FailStep. - Exception base:
BaseDomainExceptionandDomainErrorCodeinterface for namespaced error codes.
REST gateway responsible for accepting job requests, persisting them, and publishing creation events. Exposes three endpoints under /api/jobs. Includes an MDC filter for trace ID propagation and a global exception handler for standardised error responses.
Consumes job creation events, persists execution records, and dispatches ready jobs on a 10-second polling interval using SELECT ... FOR UPDATE SKIP LOCKED. Manages retry logic with exponential backoff (capped at 300 seconds) and routes exhausted jobs to a dead-letter topic.
Consumes dispatch events across 3 Kafka consumer threads, submits execution to a 12-thread pool, and runs simulation instructions step-by-step. Implements custom asynchronous offset management to handle out-of-order job completion. Uses CallerRunsPolicy for natural back-pressure when the thread pool is saturated.
React/Vite scaffold. Planned for Phase 3 development as a full monitoring and control interface.
| Concern | Approach |
|---|---|
| Event-driven decoupling | Services communicate exclusively via Kafka; no synchronous inter-service calls. |
| Database-per-service | Three isolated PostgreSQL instances enforce strict data ownership. |
| Idempotent processing | Duplicate events are detected via primary key constraints (job ID at scheduler, event ID at worker). |
| Optimistic locking | @Version on scheduler entities prevents concurrent state transition conflicts. |
| Pessimistic dispatch | FOR UPDATE SKIP LOCKED enables concurrent scheduler instances without double-dispatching. |
| Async offset commits | Custom PartitionState / OffsetCommitCoordinator allows out-of-order completion with correct watermark-based commits. |
| Back-pressure | CallerRunsPolicy on the worker thread pool slows Kafka polling when processing capacity is exceeded. |
| Exponential backoff | min(5 * 2^retryCount, 300) seconds between retry attempts. |
| Structured logging | Logback with [service=...] [traceId=...] [jobId=...] across all services. |
| Sealed types | Java 21 sealed interfaces + pattern matching for type-safe simulation step dispatch. |
POST /api/jobs
Content-Type: application/json
Request body:
{
"jobType": "SIMULATION",
"payload": {
"steps": [
{ "action": "LOG", "message": "Starting job" },
{ "action": "SLEEP", "durationMs": 2000 },
{ "action": "COMPUTE", "iterations": 500000 },
{ "action": "HTTP_CALL", "url": "https://example.com", "latencyMs": 1500 }
]
},
"maxRetryCount": 3
}Response (202 Accepted):
{
"jobId": "a1b2c3d4-...",
"traceId": "e5f6g7h8-..."
}GET /api/jobs/{id}
Response (200 OK):
{
"id": "a1b2c3d4-...",
"type": "SIMULATION",
"traceId": "e5f6g7h8-...",
"createdAt": "2026-02-28T10:00:00Z",
"updatedAt": "2026-02-28T10:00:00Z"
}GET /api/jobs?type=SIMULATION&page=0&size=20
Response (200 OK):
{
"jobs": [ ... ],
"page": 0,
"size": 20,
"totalElements": 42,
"totalPages": 3
}| Step | Fields | Behaviour |
|---|---|---|
SLEEP |
durationMs (int) |
Pauses execution for the specified duration |
LOG |
message (String) |
Logs the message at INFO level |
COMPUTE |
iterations (int) |
CPU-bound loop (sum accumulation) |
HTTP_CALL |
url (String), latencyMs (int) |
Simulates HTTP latency via sleep |
FAIL |
message (String) |
Throws a failure exception, halting execution |
These endpoints are exposed directly by the scheduler and worker for observing the full job lifecycle. They are not part of the public API gateway. List endpoints support pagination via page and size query parameters (defaults: page=0, size=20).
| Method | Endpoint | Description |
|---|---|---|
GET |
/internal/jobs/{id}/status |
Execution status for a single job |
GET |
/internal/jobs?page=0&size=20 |
List all job executions (paginated) |
GET |
/internal/jobs?status=PENDING&page=0&size=20 |
Filter executions by status (PENDING, RUNNING, COMPLETED, FAILED) |
Example response (GET /internal/jobs/{id}/status):
{
"jobId": "a1b2c3d4-...",
"traceId": "e5f6g7h8-...",
"jobStatus": "COMPLETED",
"retryCount": 0,
"maxRetries": 3,
"nextRunAt": null,
"updatedAt": "2026-03-01T06:17:00Z",
"lastError": null
}Example paginated response (GET /internal/jobs):
{
"content": [ ... ],
"page": { "size": 20, "number": 0, "totalElements": 42, "totalPages": 3 }
}| Method | Endpoint | Description |
|---|---|---|
GET |
/internal/executions/{jobId}?page=0&size=20 |
Execution attempt history for a job (paginated, newest first) |
Example response:
{
"content": [
{
"eventId": "f1e2d3c4-...",
"jobId": "a1b2c3d4-...",
"traceId": "e5f6g7h8-...",
"outcome": "SUCCESS",
"startedAt": "2026-03-01T06:16:55Z",
"finishedAt": "2026-03-01T06:17:00Z",
"errorMessage": null
}
],
"page": { "size": 20, "number": 0, "totalElements": 1, "totalPages": 1 }
}- Java 21 (JDK)
- Maven 3.x
- Docker and Docker Compose
- Postman (optional, for the included test collection)
Build all modules from the project root. This must be done before starting Docker, as the Dockerfiles copy the built JARs from each module's target/ directory.
mvn clean package -DskipTestsThis compiles jobweaver-common, jobweaver-api, jobweaver-scheduler, and jobweaver-worker, producing fat JARs in each module's target/ directory.
Launch all services using Docker Compose:
docker compose up -d --buildThis starts:
| Service | Port | Purpose |
|---|---|---|
postgres-api |
5432 | API database (jobweaver_api) |
postgres-scheduler |
5433 | Scheduler database (jobweaver_scheduler) |
postgres-worker |
5434 | Worker database (jobweaver_worker) |
zookeeper |
2181 | Kafka coordination |
kafka |
9092 | Message broker |
jobweaver-api |
8080 | REST API |
jobweaver-scheduler |
8081 | Scheduler service (+ internal status endpoints) |
jobweaver-worker |
8082 | Worker service (+ internal execution history endpoint) |
Docker Compose activates the docker Spring profile, which overrides database hosts and Kafka bootstrap servers to use Docker network hostnames.
Check that all services are healthy:
curl http://localhost:8080/actuator/health
curl http://localhost:8081/actuator/health
curl http://localhost:8082/actuator/healthYou can use the included Postman collection (JobWeaver.postman_collection.json) which has pre-built requests for job submission, lifecycle tracking, and health checks. Import it into Postman and the collection variables (api_url, scheduler_url, worker_url) are pre-configured.
Or submit a test job manually:
curl -X POST http://localhost:8080/api/jobs \
-H "Content-Type: application/json" \
-d '{
"jobType": "SIMULATION",
"payload": {
"steps": [
{ "action": "LOG", "message": "Hello from JobWeaver" },
{ "action": "SLEEP", "durationMs": 3000 },
{ "action": "COMPUTE", "iterations": 100000 }
]
},
"maxRetryCount": 2
}'Then track the lifecycle using the internal endpoints:
# Check execution status on the scheduler
curl http://localhost:8081/internal/jobs/{jobId}/status
# View execution attempts on the worker
curl http://localhost:8082/internal/executions/{jobId}Observe logs across all services:
docker compose logs -f jobweaver-api jobweaver-scheduler jobweaver-workerThe structured logging output includes traceId and jobId for tracing the job across services.
docker compose down -vThe -v flag removes named volumes, clearing all database state.
Run the full test suite across all modules:
mvn testRun tests for a specific module:
mvn test -pl jobweaver-api
mvn test -pl jobweaver-scheduler
mvn test -pl jobweaver-worker
mvn test -pl jobweaver-commonAll tests are unit-level and require no external infrastructure. The test suite includes 143+ tests across 32 test classes.
| Document | Description |
|---|---|
| ARCHITECTURE.md | Complete architecture analysis covering module decomposition, data model, Kafka topology, scheduling engine, worker execution model, retry strategy, concurrency, offset management, idempotency, exception architecture, and observability. |
| ARCHITECTURE_DIAGRAM.md | Mermaid diagrams: high-level system architecture, end-to-end sequence flow, Kafka topic flow, job state machine, worker thread model, retry/backoff flow, database layout, and Docker Compose infrastructure. |
| FUTURE_SCOPE.md | Current feature inventory, known limitations, and the Phase 3 development roadmap including round-robin scheduling, frontend dashboard, inter-service REST communication, async execution, integration testing, observability, and security. |
-
Kafka Consumer Multi-Threaded Messaging -- Confluent. The primary reference for the worker module's multi-threaded consumer architecture, covering consumer-per-thread vs. decoupled processing patterns, and the offset management challenges that arise from asynchronous job execution. https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging
-
Erta, B. (2019). Optimistic Locking in JPA -- Baeldung. Reference for the
@Version-based optimistic locking strategy used on theJobExecutionentity to prevent concurrent state transition conflicts between the dispatch thread and Kafka listener threads. https://www.baeldung.com/jpa-optimistic-locking -
PostgreSQL SELECT FOR UPDATE SKIP LOCKED -- PostgreSQL Documentation. Foundation for the scheduler's concurrent dispatch strategy, enabling multiple scheduler instances to poll for ready jobs without double-dispatching through row-level advisory locking. https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE