Skip to content

Commit 7431613

Browse files
authored
update: removed all schema registry stuff (#139)
* update: removed all schema registry stuff * fix: removed outdated text from testing docs
1 parent 535de70 commit 7431613

20 files changed

Lines changed: 42 additions & 268 deletions

File tree

.github/actions/e2e-boot/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ runs:
2828
IMAGE_TAG='"$IMAGE_TAG"' docker compose pull --quiet 2>&1
2929
echo "--- pull done, starting infra ---"
3030
docker compose up -d --no-build \
31-
mongo redis shared-ca zookeeper-certgen zookeeper kafka schema-registry 2>&1
31+
mongo redis shared-ca zookeeper-certgen zookeeper kafka 2>&1
3232
echo $? > /tmp/infra-pull.exit
3333
' > /tmp/infra-pull.log 2>&1 &
3434
echo $! > /tmp/infra-pull.pid

.github/workflows/stack-tests.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ env:
3030
REDIS_IMAGE: redis:7-alpine
3131
KAFKA_IMAGE: confluentinc/cp-kafka:7.8.2
3232
ZOOKEEPER_IMAGE: confluentinc/cp-zookeeper:7.8.2
33-
SCHEMA_REGISTRY_IMAGE: confluentinc/cp-schema-registry:7.8.2
3433
K3S_VERSION: v1.32.11+k3s1
3534
K3S_INSTALL_SHA256: d75e014f2d2ab5d30a318efa5c326f3b0b7596f194afcff90fa7a7a91166d5f7
3635

@@ -314,7 +313,7 @@ jobs:
314313
run: |
315314
mkdir -p logs
316315
docker compose logs --timestamps > logs/docker-compose.log 2>&1
317-
for svc in backend mongo redis kafka zookeeper schema-registry \
316+
for svc in backend mongo redis kafka zookeeper \
318317
coordinator k8s-worker pod-monitor result-processor \
319318
saga-orchestrator event-replay dlq-processor; do
320319
docker compose logs --timestamps "$svc" > "logs/$svc.log" 2>&1 || true

deploy.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ cmd_infra() {
193193

194194
# Start only infrastructure services (no app, no workers, no observability)
195195
# zookeeper-certgen is needed for kafka to start
196-
docker compose up -d zookeeper-certgen mongo redis zookeeper kafka schema-registry $WAIT_FLAG $WAIT_TIMEOUT_FLAG
196+
docker compose up -d zookeeper-certgen mongo redis zookeeper kafka $WAIT_FLAG $WAIT_TIMEOUT_FLAG
197197

198198
print_success "Infrastructure services started"
199199
docker compose ps

docker-compose.yaml

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,6 @@ services:
100100
condition: service_healthy
101101
kafka:
102102
condition: service_healthy
103-
schema-registry:
104-
condition: service_healthy
105103
volumes:
106104
- ./backend/app:/app/app
107105
- ./backend/workers:/app/workers
@@ -327,40 +325,17 @@ services:
327325
retries: 15
328326
start_period: 3s
329327

330-
schema-registry:
331-
image: confluentinc/cp-schema-registry:7.8.2
332-
container_name: schema-registry
333-
depends_on:
334-
kafka:
335-
condition: service_healthy
336-
ports:
337-
- "8081:8081"
338-
environment:
339-
SCHEMA_REGISTRY_HOST_NAME: schema-registry
340-
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:29092
341-
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
342-
networks:
343-
- app-network
344-
healthcheck:
345-
test: ["CMD", "curl", "-f", "http://localhost:8081/config"]
346-
interval: 3s
347-
timeout: 5s
348-
retries: 15
349-
start_period: 5s
350-
351328
kafdrop:
352329
image: obsidiandynamics/kafdrop:3.31.0
353330
container_name: kafdrop
354331
profiles: ["debug"]
355332
depends_on:
356333
- kafka
357-
- schema-registry
358334
ports:
359335
- "9000:9000"
360336
environment:
361337
KAFKA_BROKERCONNECT: kafka:29092
362338
JVM_OPTS: "-Xms256M -Xmx512M"
363-
SCHEMAREGISTRY_CONNECT: http://schema-registry:8081
364339
networks:
365340
- app-network
366341

@@ -378,11 +353,8 @@ services:
378353
condition: service_completed_successfully
379354
kafka:
380355
condition: service_healthy
381-
schema-registry:
382-
condition: service_healthy
383356
environment:
384357
- KAFKA_BOOTSTRAP_SERVERS=kafka:29092
385-
- SCHEMA_REGISTRY_URL=http://schema-registry:8081
386358
volumes:
387359
- ./backend/config.toml:/app/config.toml:ro
388360
- ./backend/secrets.toml:/app/secrets.toml:ro

docs/architecture/event-system-design.md

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document explains how events flow through the system and how domain events
44

55
## The unified event model
66

7-
Events in Integr8sCode use a unified design where domain events are directly Avro-serializable:
7+
Events in Integr8sCode use a unified design where domain events are plain Pydantic models serialized as JSON:
88

99
```mermaid
1010
graph LR
@@ -13,7 +13,7 @@ graph LR
1313
end
1414
1515
subgraph "Domain Layer"
16-
DE[Domain Events<br/>typed.py<br/>extends AvroBase]
16+
DE[Domain Events<br/>typed.py<br/>extends BaseModel]
1717
end
1818
1919
subgraph "Infrastructure"
@@ -26,7 +26,7 @@ graph LR
2626
DE --> MongoDB[(MongoDB)]
2727
```
2828

29-
The `EventType` enum defines all possible event types as strings. Domain events are Pydantic models that extend `AvroBase` (from `pydantic-avro`), making them both usable for MongoDB storage and Avro-serializable for Kafka. The mappings module routes events to the correct Kafka topics.
29+
The `EventType` enum defines all possible event types as strings. Domain events are Pydantic `BaseModel` subclasses, making them usable for both MongoDB storage and Kafka transport. FastStream handles JSON serialization natively when publishing and deserializing when consuming. The mappings module routes events to the correct Kafka topics.
3030

3131
This design eliminates duplication between "domain events" and "Kafka events" by making the domain event the single source of truth.
3232

@@ -42,7 +42,7 @@ Earlier designs maintained separate domain and Kafka event classes, arguing that
4242
The unified approach addresses these issues:
4343

4444
- **Single definition**: Each event is defined once in `domain/events/typed.py`
45-
- **Avro-compatible**: `BaseEvent` extends `AvroBase`, enabling automatic schema generation
45+
- **JSON-native**: `BaseEvent` extends Pydantic `BaseModel`; FastStream serializes to JSON automatically
4646
- **Storage-ready**: Events include storage fields (`stored_at`, `ttl_expires_at`) that MongoDB uses
4747
- **Topic routing**: The `EVENT_TYPE_TO_TOPIC` mapping in `infrastructure/kafka/mappings.py` handles routing
4848

@@ -92,12 +92,12 @@ sequenceDiagram
9292

9393
This approach is more performant than trying each union member until one validates. The discriminator tells Pydantic exactly which class to use.
9494

95-
## BaseEvent and AvroBase
95+
## BaseEvent
9696

97-
The `BaseEvent` class provides common fields for all events and inherits from `AvroBase` for Avro schema generation:
97+
The `BaseEvent` class provides common fields for all events:
9898

9999
```python
100-
class BaseEvent(AvroBase):
100+
class BaseEvent(BaseModel):
101101
"""Base fields for all domain events."""
102102
model_config = ConfigDict(from_attributes=True)
103103

@@ -111,10 +111,7 @@ class BaseEvent(AvroBase):
111111
ttl_expires_at: datetime = Field(default_factory=...)
112112
```
113113

114-
The `AvroBase` inheritance enables:
115-
- Automatic Avro schema generation via `BaseEvent.avro_schema()`
116-
- Serialization through the Schema Registry
117-
- Forward compatibility checking
114+
Since `BaseEvent` is a plain Pydantic model, FastStream handles serialization and deserialization transparently — publishing calls `model.model_dump_json()` under the hood, and subscribers receive typed model instances from the incoming JSON.
118115

119116
## Topic routing
120117

@@ -180,9 +177,9 @@ graph TB
180177
```
181178

182179
When publishing events, the `UnifiedProducer`:
183-
1. Looks up the topic via `EVENT_TYPE_TO_TOPIC`
184-
2. Serializes the event using the Schema Registry
185-
3. Publishes to Kafka
180+
1. Persists the event to MongoDB via `EventRepository`
181+
2. Looks up the topic via `EVENT_TYPE_TO_TOPIC`
182+
3. Publishes the Pydantic model to Kafka through `broker.publish()` (FastStream handles JSON serialization)
186183

187184
The producer handles both storage in MongoDB and publishing to Kafka in a single flow.
188185

@@ -193,7 +190,7 @@ The producer handles both storage in MongoDB and publishing to Kafka in a single
193190
| [`domain/enums/events.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/domain/enums/events.py) | `EventType` enum with all event type values |
194191
| [`domain/events/typed.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/domain/events/typed.py) | All domain event classes and `DomainEvent` union |
195192
| [`infrastructure/kafka/mappings.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/infrastructure/kafka/mappings.py) | Event-to-topic routing and helper functions |
196-
| [`services/kafka_event_service.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/services/kafka_event_service.py) | Publishes events to both MongoDB and Kafka |
193+
| [`events/core/producer.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/events/core/producer.py) | UnifiedProducer — persists to MongoDB then publishes to Kafka |
197194
| [`tests/unit/domain/events/test_event_schema_coverage.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/tests/unit/domain/events/test_event_schema_coverage.py) | Validates correspondence between enum and event classes |
198195

199196
## Related docs

docs/architecture/kafka-topic-architecture.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,9 @@ Admins can:
147147

148148
Key files:
149149

150-
- `domain/events/typed.py` — all Pydantic event models (extends `AvroBase` for Avro serialization)
150+
- `domain/events/typed.py` — all Pydantic event models (plain `BaseModel` subclasses)
151151
- `infrastructure/kafka/mappings.py` — event-to-topic routing and helper functions
152-
- `events/schema/schema_registry.py`schema manager
153-
- `events/core/{producer,consumer,dispatcher}.py`unified Kafka plumbing
152+
- `events/core/producer.py`UnifiedProducer (persists to MongoDB, publishes to Kafka)
153+
- `events/handlers.py`FastStream subscriber registrations for all workers
154154

155-
All events are Pydantic models with *strict typing* that extend `AvroBase` for Avro schema generation. The mappings module routes each event type to its destination topic via `EVENT_TYPE_TO_TOPIC`. Schema Registry integration ensures producers and consumers agree on structure, catching incompatible changes *before* runtime failures. The unified producer and consumer classes handle serialization, error handling, and observability.
155+
All events are Pydantic models with strict typing. FastStream handles JSON serialization natively — the producer publishes Pydantic instances directly via `broker.publish()`, and subscribers receive typed model instances. The mappings module routes each event type to its destination topic via `EVENT_TYPE_TO_TOPIC`. Pydantic validation on both ends ensures structural agreement between producers and consumers.

docs/architecture/overview.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ For details on specific components, see:
1717
![System diagram](/assets/images/system_diagram.png)
1818

1919
The SPA hits the frontend, which proxies to the API over HTTPS; the API
20-
serves both REST and SSE. Kafka carries events with Schema Registry and Zookeeper backing it; kafka-
20+
serves both REST and SSE. Kafka carries events as JSON (serialized by FastStream) with Zookeeper backing it; kafka-
2121
init seeds topics. All workers are separate containers subscribed to Kafka; the k8s-worker talks to the
2222
Kubernetes API to run code, the pod-monitor watches pods, the result-processor writes results to Mongo
2323
and nudges Redis for SSE fanout, and the saga-orchestrator coordinates long flows with Mongo and Redis.
@@ -44,7 +44,6 @@ graph LR
4444
Mongo[(MongoDB)]
4545
Redis[(Redis)]
4646
Kafka[Kafka]
47-
Schema["Schema Registry"]
4847
K8s["Kubernetes API"]
4948
OTel["OTel Collector"]
5049
VM["VictoriaMetrics"]
@@ -63,7 +62,6 @@ graph LR
6362
Repos --> Mongo
6463
Services <-->|"keys + SSE bus"| Redis
6564
Events <-->|"produce/consume"| Kafka
66-
Events ---|"subjects/IDs"| Schema
6765
Services -->|"read limits"| K8s
6866
6967
%% Telemetry edges
@@ -76,11 +74,11 @@ graph LR
7674
- **Routers**: REST + SSE endpoints
7775
- **DI (Dishka)**: Dependency injection & providers
7876
- **Services**: Execution, Events, SSE, Idempotency, Notifications, User Settings, Rate Limit, Saved Scripts, Replay, Saga API
79-
- **Kafka Layer**: Producer, Consumer, Dispatcher, EventStore, SchemaRegistryManager
77+
- **Kafka Layer**: UnifiedProducer, FastStream subscribers, EventStore
8078

8179
FastAPI under Uvicorn exposes REST and SSE routes, with middleware and DI wiring the core services.
8280
Those services use Mongo-backed repositories for state and a unified Kafka layer to publish and react
83-
to events, with the schema registry ensuring compatibility. Redis handles rate limiting and SSE fanout.
81+
to events. FastStream handles Pydantic JSON serialization for all Kafka messages. Redis handles rate limiting and SSE fanout.
8482
Telemetry flows through the OpenTelemetry Collector—metrics to VictoriaMetrics for Grafana and traces
8583
to Jaeger. Kubernetes interactions are read via the API. This view focuses on the app’s building blocks;
8684
event workers live in the system diagram.

docs/architecture/user-settings-events.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ notifications, or editor settings. This eliminates branching in both publishing
1313
```
1414

1515
The `changed_fields` list identifies which settings changed. Typed fields (`theme`, `notifications`, `editor`, etc.)
16-
contain the new values in Avro-compatible form.
16+
contain the new values as Pydantic model fields.
1717

1818
## TypeAdapter pattern
1919

docs/components/schema-manager.md

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Schema management
22

3-
The backend manages two kinds of schemas: MongoDB collections with indexes and validators, and Kafka event schemas in Avro format with a Confluent Schema Registry. Both are initialized at process start, whether the process is the main API or a standalone worker.
3+
The backend manages MongoDB collection schemas — indexes, validators, and TTL policies. These are initialized at process start, whether the process is the main API or a standalone worker.
4+
5+
Kafka event serialization is handled entirely by FastStream with Pydantic JSON; there is no schema registry involved. See [Event System Design](../architecture/event-system-design.md) for details on event serialization.
46

57
## MongoDB schema
68

@@ -14,32 +16,19 @@ Other migrations create indexes for user settings snapshots, replay sessions, no
1416

1517
Repositories don't create their own indexes — they only read and write. This separation keeps startup behavior predictable and prevents the same index being created from multiple code paths.
1618

17-
## Kafka schema registry
18-
19-
The `SchemaRegistryManager` class in `app/events/schema/schema_registry.py` handles Avro serialization for Kafka events. All registry operations are async and must be awaited. The manager connects to a Confluent Schema Registry and registers schemas for all event types at startup via `await initialize_schemas()`.
20-
21-
All event classes in `domain/events/typed.py` extend `AvroBase` (from `pydantic-avro`), enabling automatic Avro schema generation. The manager registers these schemas with subjects named after the class (like `ExecutionRequestedEvent-value`) and sets FORWARD compatibility, meaning new schemas can add fields but not remove required ones. This allows producers to be upgraded before consumers without breaking deserialization.
22-
23-
Serialization and deserialization are async — `await serialize_event(event)` and `await deserialize_event(data, topic)` must be awaited. The wire format follows Confluent conventions: a magic byte, four-byte schema id, then the Avro binary payload. The underlying `python-schema-registry-client` library handles schema registration caching internally. The manager maintains a bidirectional cache between schema ids and Python event classes for deserialization. When deserializing, it reads the schema id from the message header, looks up the corresponding event class, deserializes the Avro payload to a dict, and hydrates the Pydantic model.
24-
25-
For test isolation, the manager supports an optional `SCHEMA_SUBJECT_PREFIX` environment variable. Setting this to something like `test.session123.` prefixes all subject names, preventing test runs from polluting production schemas or interfering with each other.
26-
2719
## Startup sequence
2820

29-
During API startup, the `lifespan` function in `dishka_lifespan.py` gets the database from the DI container, creates a `SchemaManager`, and calls `await apply_all()`. It does the same for `SchemaRegistryManager`, calling `await initialize_schemas()` to register all event types (async — must be awaited). Workers like the saga orchestrator and event replay service follow the same pattern — they connect to MongoDB, run schema migrations, and await schema registry initialization before starting their main loops.
21+
During API startup, the `lifespan` function in `dishka_lifespan.py` initializes Beanie with the MongoDB client, then resolves the `KafkaBroker` from DI, registers FastStream subscribers, sets up Dishka integration, and starts the broker. Workers follow the same pattern — they connect to MongoDB, initialize Beanie, register their subscribers on the broker, and start consuming.
3022

3123
## Local development
3224

3325
To force a specific MongoDB migration to run again, delete its document from `schema_versions`. To start fresh, point the app at a new database. Migrations are designed to be additive; the system doesn't support automatic rollbacks. If you need to undo a migration in production, you'll have to drop indexes or modify validators manually.
3426

35-
For Kafka schemas, the registry keeps all versions. If you break compatibility and need to start over, delete the subject from the registry (either via REST API or the registry's UI if available) and let the app re-register on next startup.
36-
3727
## Key files
3828

3929
| File | Purpose |
4030
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------|
4131
| [`schema_manager.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/db/schema/schema_manager.py) | MongoDB migrations |
42-
| [`schema_registry.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/events/schema/schema_registry.py) | Kafka Avro serialization |
43-
| [`typed.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/domain/events/typed.py) | Domain events (extend AvroBase) |
32+
| [`typed.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/domain/events/typed.py) | Domain events (Pydantic BaseModel) |
4433
| [`mappings.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/infrastructure/kafka/mappings.py) | Event-to-topic routing |
45-
| [`dishka_lifespan.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/dishka_lifespan.py) | Startup initialization |
34+
| [`dishka_lifespan.py`](https://github.com/HardMax71/Integr8sCode/blob/main/backend/app/core/dishka_lifespan.py) | Startup initialization |

docs/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This guide walks you through running Integr8sCode locally and executing your fir
44

55
## What you're deploying
66

7-
The full stack includes a Svelte frontend, FastAPI backend, MongoDB, Redis, Kafka with Schema Registry, and seven background workers. Sounds like a lot, but `docker compose` handles all of it. First startup takes a few minutes to pull images and initialize services; subsequent starts are much faster.
7+
The full stack includes a Svelte frontend, FastAPI backend, MongoDB, Redis, Kafka, and seven background workers. Sounds like a lot, but `docker compose` handles all of it. First startup takes a few minutes to pull images and initialize services; subsequent starts are much faster.
88

99
## Start the stack
1010

0 commit comments

Comments
 (0)