Skip to content

Conversation

@beezz
Copy link

@beezz beezz commented Mar 31, 2025

https://github.com/getsentry/streaming-planning/issues/100

Using deployment_config from #77 this PR adds command that will generate k8s resources:

  • deployment for each segment of pipeline configuration
  • configmap for the configuration

To generate k8s resources using default templates:

python -m sentry_streams.k8s.generate  \
        --config sentry_streams/examples/sample_configs/deployment_config.yaml  \
        --image arroyo:latest  \
        --namespace pipeline-test
  • docker image ?
  • (default/required) resources requests/limits
  • liveness and readiness probes
  • update strategy
  • deployment update on config change
  • providing additional environment variables
  • generic resources attributes overrides

cli support or custom templates:

  • resources name templating
  • resources labels templating
  • annotations
  • sidecars

@beezz
Copy link
Author

beezz commented Mar 31, 2025

$ python -m sentry_streams.k8s.generate --config sentry_streams/examples/sample_configs/deployment_config.yaml --image "rebelthor/sleep:latest" --namespace default

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    pipeline: example-pipeline
    segment: '0'
  name: example-pipeline-0
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      pipeline: example-pipeline
      segment: '0'
  template:
    metadata:
      labels:
        pipeline: example-pipeline
        segment: '0'
    spec:
      containers:
      - args:
        - --config
        - /etc/example-pipeline-config
        command:
        - python
        - -m
        - sentry_streams.runner
        env:
        - name: SEGMENT_ID
          value: '0'
        image: rebelthor/sleep:latest
        name: segment
        volumeMounts:
        - mountPath: /etc/example-pipeline-config
          name: example-pipeline-config
          readOnly: true
          subPath: config
      volumes:
      - configMap:
          name: example-pipeline
        name: example-pipeline-config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    pipeline: example-pipeline
    segment: '1'
  name: example-pipeline-1
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      pipeline: example-pipeline
      segment: '1'
  template:
    metadata:
      labels:
        pipeline: example-pipeline
        segment: '1'
    spec:
      containers:
      - args:
        - --config
        - /etc/example-pipeline-config
        command:
        - python
        - -m
        - sentry_streams.runner
        env:
        - name: SEGMENT_ID
          value: '1'
        image: rebelthor/sleep:latest
        name: segment
        volumeMounts:
        - mountPath: /etc/example-pipeline-config
          name: example-pipeline-config
          readOnly: true
          subPath: config
      volumes:
      - configMap:
          name: example-pipeline
        name: example-pipeline-config
---
apiVersion: v1
data:
  config: "env:\n  topics:\n    events: events\n    transformed-events: transformed-events\n\
    \    transformed-events-2: transformed-events-2\npipeline:\n  name: example-pipeline\n\
    \  segments:\n  - parallelism: 2\n    steps_config:\n      myinput:\n        bootstrap_servers:\
    \ kafka:9093\n        starts_segment: true\n  - parallelism: 3\n    steps_config:\n\
    \      kafkasink:\n        bootstrap_servers: kafka:9093\n"
kind: ConfigMap
metadata:
  labels:
    pipeline: example-pipeline
  name: example-pipeline
  namespace: default

@beezz
Copy link
Author

beezz commented Mar 31, 2025

python -m sentry_streams.k8s.generate --help
usage: generate.py [-h] --config CONFIG [--deployment-template DEPLOYMENT_TEMPLATE] [--configmap-template CONFIGMAP_TEMPLATE] [--output OUTPUT] [--container-name CONTAINER_NAME] --image IMAGE
                   [--namespace NAMESPACE]

Generate k8s resources from a sentry_streams deployment config.

options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to a deployment config file.
  --deployment-template DEPLOYMENT_TEMPLATE
                        Path to a deployment template file.
  --configmap-template CONFIGMAP_TEMPLATE
                        Path to a configmap template file.
  --output OUTPUT       Output target file. Defaults to stdout.
  --container-name CONTAINER_NAME
                        Streams segment application container name.
  --image IMAGE         Segment container image.
  --namespace NAMESPACE
                        Namespace for deployment and configmap.

Comment on lines 110 to 118
parser.add_argument(
"--deployment-template",
type=argparse.FileType("r"),
help="Path to a deployment template file.",
default=open(
importlib.resources.files("sentry_streams") / "k8s/templates/deployment.yaml",
"r",
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like

@fpacifici
Copy link
Collaborator

One high level question we will have to have at least a high level idea on.
How does the generation of these resource fits in the SRE process to deploy Kuberentes resource ?

I see two options considering many pipelines will be running getsentry and, as of now, it seems SRE is pushing to keep the monolith intact in terms of k8s resources deployment.:

  • sentry-kube becomes capable to deploy streaming pipeline. Imagine a sentry-infra-tool importing a package that contains the logic you provide here (I would like this code did not move into sentry-kube. Though we can publish a package sentry-kube can import). The pipelines are value files. We write a sentry-kube macro that triggers this code.
  • Streaming pipeline are deployed with a different tool that directly generate manifests with what you are building and deploy via GoCD. Basically streaming pipeline behave as if they were their own services during deployment.

Doing it with an operator changes a bit the problem but not entirely. A resource still has to be deployed that references the getsentry pod template.

Did you already put any thought on this ?

@beezz
Copy link
Author

beezz commented Apr 2, 2025

I was thinking about it and imo the best would be to go with an operator from the start. This way we can deploy the operator with sentry-kube, could be even as part of getsentry monolith.

  • sentry-kube becomes capable to deploy streaming pipeline. Imagine a sentry-infra-tool importing a package that contains the logic you provide here (I would like this code did not move into sentry-kube. Though we can publish a package sentry-kube can import). The pipelines are value files. We write a sentry-kube macro that triggers this code.

This can be a stopgap solution as it sounds quite easy to implement but on the other hand the same effort might be to get minimal operator into production.

Copy link
Collaborator

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this goes in the right direction.
Please see some comments inline. I think the key aspects to work on at this stage are:

  • The CRD rather than using the configmap as we discussed off line
  • move the code of the operator in its own package so sentry_streams does not have to depend on kops and kubernetes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the whole operator code, examples and Dockerfile outside of the sentry_streams directyory. Let's move everything into k8s.
This is because the content of sentry_streams is the package that contains the streaming platform runtime. That is imported as a library in every code base that runs streaming applications so it should be relatively lite. Importing kopf and kubernetes are not needed tu run sentry thus we should not import them transitively.

labels:
service: streams-operator
data:
events.yaml: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the yaml content inside the configmap is not ideal as it is plain text.
I think the CRD will be important for that as well.

Comment on lines 179 to 184
) + [
generate_configmap(
config=config,
configmap_template=configmap_template,
)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have to generate and apply the configmaps before the deployments.

) -> K8sConfigMapManifest:
configmap = copy.deepcopy(configmap_template)
pipeline_name = config["pipeline"]["name"]
configmap["metadata"]["name"] = pipeline_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to call the configmap pipeline_<PIPELINE_NAME> rather than PIPELINE_NAME as it has to be a unique name in the namespace.

Comment on lines 135 to 136
streams_operator = StreamsOperator()
streams_operator.register_handlers()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does kopf require the operator to be defined at module level ?
If not can we have it instantiated in the main function or in a function called by main passing the config parameters into the constructor rather than transparently taking them from envvars in the constructor itself? It makes the modularization a bit better and unit tests easier.

else:
raise
# config-map already exists patch it
self.core_v1_client.patch_namespaced_config_map(name=name, namespace=namespace, body=body)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add logs before the operator does any write operation.

self.apps_v1_client.create_namespaced_deployment(namespace=namespace, body=body)
else:
raise
# do not patch the image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of GoCD ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct, so that the current way we deploy new images keeps working even for deployments managed by the operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants