Parse the pipeline deployment descriptor and generate k8s resources manifests #87

beezz · 2025-03-31T12:51:28Z

https://github.com/getsentry/streaming-planning/issues/100

Using deployment_config from #77 this PR adds command that will generate k8s resources:

deployment for each segment of pipeline configuration
configmap for the configuration

To generate k8s resources using default templates:

python -m sentry_streams.k8s.generate  \
        --config sentry_streams/examples/sample_configs/deployment_config.yaml  \
        --image arroyo:latest  \
        --namespace pipeline-test

docker image ?
(default/required) resources requests/limits
liveness and readiness probes
update strategy
deployment update on config change
providing additional environment variables
generic resources attributes overrides

cli support or custom templates:

resources name templating
resources labels templating
annotations
sidecars

beezz · 2025-03-31T12:52:28Z

$ python -m sentry_streams.k8s.generate --config sentry_streams/examples/sample_configs/deployment_config.yaml --image "rebelthor/sleep:latest" --namespace default

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    pipeline: example-pipeline
    segment: '0'
  name: example-pipeline-0
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      pipeline: example-pipeline
      segment: '0'
  template:
    metadata:
      labels:
        pipeline: example-pipeline
        segment: '0'
    spec:
      containers:
      - args:
        - --config
        - /etc/example-pipeline-config
        command:
        - python
        - -m
        - sentry_streams.runner
        env:
        - name: SEGMENT_ID
          value: '0'
        image: rebelthor/sleep:latest
        name: segment
        volumeMounts:
        - mountPath: /etc/example-pipeline-config
          name: example-pipeline-config
          readOnly: true
          subPath: config
      volumes:
      - configMap:
          name: example-pipeline
        name: example-pipeline-config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    pipeline: example-pipeline
    segment: '1'
  name: example-pipeline-1
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      pipeline: example-pipeline
      segment: '1'
  template:
    metadata:
      labels:
        pipeline: example-pipeline
        segment: '1'
    spec:
      containers:
      - args:
        - --config
        - /etc/example-pipeline-config
        command:
        - python
        - -m
        - sentry_streams.runner
        env:
        - name: SEGMENT_ID
          value: '1'
        image: rebelthor/sleep:latest
        name: segment
        volumeMounts:
        - mountPath: /etc/example-pipeline-config
          name: example-pipeline-config
          readOnly: true
          subPath: config
      volumes:
      - configMap:
          name: example-pipeline
        name: example-pipeline-config
---
apiVersion: v1
data:
  config: "env:\n  topics:\n    events: events\n    transformed-events: transformed-events\n\
    \    transformed-events-2: transformed-events-2\npipeline:\n  name: example-pipeline\n\
    \  segments:\n  - parallelism: 2\n    steps_config:\n      myinput:\n        bootstrap_servers:\
    \ kafka:9093\n        starts_segment: true\n  - parallelism: 3\n    steps_config:\n\
    \      kafkasink:\n        bootstrap_servers: kafka:9093\n"
kind: ConfigMap
metadata:
  labels:
    pipeline: example-pipeline
  name: example-pipeline
  namespace: default

beezz · 2025-03-31T13:01:44Z

python -m sentry_streams.k8s.generate --help
usage: generate.py [-h] --config CONFIG [--deployment-template DEPLOYMENT_TEMPLATE] [--configmap-template CONFIGMAP_TEMPLATE] [--output OUTPUT] [--container-name CONTAINER_NAME] --image IMAGE
                   [--namespace NAMESPACE]

Generate k8s resources from a sentry_streams deployment config.

options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to a deployment config file.
  --deployment-template DEPLOYMENT_TEMPLATE
                        Path to a deployment template file.
  --configmap-template CONFIGMAP_TEMPLATE
                        Path to a configmap template file.
  --output OUTPUT       Output target file. Defaults to stdout.
  --container-name CONTAINER_NAME
                        Streams segment application container name.
  --image IMAGE         Segment container image.
  --namespace NAMESPACE
                        Namespace for deployment and configmap.

fpacifici · 2025-03-31T21:41:05Z

sentry_streams/sentry_streams/k8s/generate.py

+    parser.add_argument(
+        "--deployment-template",
+        type=argparse.FileType("r"),
+        help="Path to a deployment template file.",
+        default=open(
+            importlib.resources.files("sentry_streams") / "k8s/templates/deployment.yaml",
+            "r",
+        ),
+    )


fpacifici · 2025-03-31T22:00:29Z

One high level question we will have to have at least a high level idea on.
How does the generation of these resource fits in the SRE process to deploy Kuberentes resource ?

I see two options considering many pipelines will be running getsentry and, as of now, it seems SRE is pushing to keep the monolith intact in terms of k8s resources deployment.:

sentry-kube becomes capable to deploy streaming pipeline. Imagine a sentry-infra-tool importing a package that contains the logic you provide here (I would like this code did not move into sentry-kube. Though we can publish a package sentry-kube can import). The pipelines are value files. We write a sentry-kube macro that triggers this code.
Streaming pipeline are deployed with a different tool that directly generate manifests with what you are building and deploy via GoCD. Basically streaming pipeline behave as if they were their own services during deployment.

Doing it with an operator changes a bit the problem but not entirely. A resource still has to be deployed that references the getsentry pod template.

Did you already put any thought on this ?

beezz · 2025-04-02T12:56:48Z

I was thinking about it and imo the best would be to go with an operator from the start. This way we can deploy the operator with sentry-kube, could be even as part of getsentry monolith.

sentry-kube becomes capable to deploy streaming pipeline. Imagine a sentry-infra-tool importing a package that contains the logic you provide here (I would like this code did not move into sentry-kube. Though we can publish a package sentry-kube can import). The pipelines are value files. We write a sentry-kube macro that triggers this code.

This can be a stopgap solution as it sounds quite easy to implement but on the other hand the same effort might be to get minimal operator into production.

sentry_streams/sentry_streams/k8s/generate.py

fpacifici

I think this goes in the right direction.
Please see some comments inline. I think the key aspects to work on at this stage are:

The CRD rather than using the configmap as we discussed off line
move the code of the operator in its own package so sentry_streams does not have to depend on kops and kubernetes.

fpacifici · 2025-04-24T00:43:25Z

sentry_streams/Makefile

I think we should move the whole operator code, examples and Dockerfile outside of the sentry_streams directyory. Let's move everything into k8s.
This is because the content of sentry_streams is the package that contains the streaming platform runtime. That is imported as a library in every code base that runs streaming applications so it should be relatively lite. Importing kopf and kubernetes are not needed tu run sentry thus we should not import them transitively.

fpacifici · 2025-04-24T00:54:39Z

sentry_streams/k8s/operator.deploy.yaml

+  labels:
+    service: streams-operator
+data:
+  events.yaml: |


Adding the yaml content inside the configmap is not ideal as it is plain text.
I think the CRD will be important for that as well.

sentry_streams/sentry_streams/k8s/generate.py

fpacifici · 2025-04-24T01:03:22Z

sentry_streams/sentry_streams/k8s/generate.py

+    ) + [
+        generate_configmap(
+            config=config,
+            configmap_template=configmap_template,
+        )
+    ]


I think you may have to generate and apply the configmaps before the deployments.

fpacifici · 2025-04-24T01:06:20Z

sentry_streams/sentry_streams/k8s/generate.py

+) -> K8sConfigMapManifest:
+    configmap = copy.deepcopy(configmap_template)
+    pipeline_name = config["pipeline"]["name"]
+    configmap["metadata"]["name"] = pipeline_name


I think we may want to call the configmap pipeline_<PIPELINE_NAME> rather than PIPELINE_NAME as it has to be a unique name in the namespace.

fpacifici · 2025-04-24T01:14:52Z

sentry_streams/sentry_streams/k8s/operator/handlers.py

+streams_operator = StreamsOperator()
+streams_operator.register_handlers()


Does kopf require the operator to be defined at module level ?
If not can we have it instantiated in the main function or in a function called by main passing the config parameters into the constructor rather than transparently taking them from envvars in the constructor itself? It makes the modularization a bit better and unit tests easier.

fpacifici · 2025-04-24T01:21:13Z

sentry_streams/sentry_streams/k8s/operator/handlers.py

+            else:
+                raise
+        # config-map already exists patch it
+        self.core_v1_client.patch_namespaced_config_map(name=name, namespace=namespace, body=body)


I'd add logs before the operator does any write operation.

fpacifici · 2025-04-24T01:22:07Z

sentry_streams/sentry_streams/k8s/operator/handlers.py

+                self.apps_v1_client.create_namespaced_deployment(namespace=namespace, body=body)
+            else:
+                raise
+        # do not patch the image


Is this because of GoCD ?

Yes, correct, so that the current way we deploy new images keeps working even for deployments managed by the operator.

wip generation of k8s resources from deployment config

975a734

fpacifici reviewed Mar 31, 2025

View reviewed changes

beezz added 3 commits April 1, 2025 09:44

Fix volumes attr placement and template defaults

84f50ce

Add pyyaml stubes

b40b093

fix types

e2fc2c9

wip operator

75a3188

markstory reviewed Apr 23, 2025

View reviewed changes

sentry_streams/sentry_streams/k8s/generate.py Show resolved Hide resolved

fpacifici reviewed Apr 24, 2025

View reviewed changes

beezz added 6 commits April 24, 2025 10:34

moving operator to separate package

ff50182

install operator in install-dev makefile target

d71e70a

fix template path and basic configure logging

a14782c

unused vars

78bdf70

add mypy for streams_operator

7424119

init settings and tests

2b82154

		streams_operator = StreamsOperator()
		streams_operator.register_handlers()

Uh oh!

Parse the pipeline deployment descriptor and generate k8s resources manifests #87

Are you sure you want to change the base?

Parse the pipeline deployment descriptor and generate k8s resources manifests #87

Uh oh!

Conversation

beezz commented Mar 31, 2025

Uh oh!

beezz commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beezz commented Mar 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpacifici commented Mar 31, 2025

Uh oh!

beezz commented Apr 2, 2025

Uh oh!

Uh oh!

fpacifici left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

beezz commented Mar 31, 2025 •

edited

Loading