From 717e3b394c8ec7cffb16006c706182981a890040 Mon Sep 17 00:00:00 2001 From: kpwrennbu Date: Thu, 14 May 2026 13:14:27 -0400 Subject: [PATCH] Add Tekton Pipelines guide for NERC OpenShift Documents how to install and use Tekton Pipelines on NERC OpenShift, using the DS-551 Data Engineering at Scale course (Boston University, Spring 2026) as a working reference deployment. Covers operator install, ServiceAccount/RBAC, Task/Pipeline/PipelineRun definitions, CronJob scheduling, observability commands, lessons learned from running 12 concurrent student pipelines, and a reference architecture diagram of the full cluster layout. Co-Authored-By: Claude Sonnet 4.6 --- .../applications/tekton-pipelines.md | 365 ++++++++++++++++++ mkdocs.yaml | 2 + 2 files changed, 367 insertions(+) create mode 100644 docs/openshift/applications/tekton-pipelines.md diff --git a/docs/openshift/applications/tekton-pipelines.md b/docs/openshift/applications/tekton-pipelines.md new file mode 100644 index 000000000..3536a6299 --- /dev/null +++ b/docs/openshift/applications/tekton-pipelines.md @@ -0,0 +1,365 @@ +# Using Tekton Pipelines on NERC OpenShift + +**Contributed by:** DS-551 Data Engineering at Scale, Boston University +**Tested on:** NERC OpenShift, Spring 2026 + +--- + +## Overview + +This page documents how to install and use [Tekton Pipelines](https://tekton.dev) on NERC OpenShift, using the DS-551 course infrastructure as a working reference deployment. In Spring 2026, 12 student teams each built a multi-phase data pipeline on NERC using Tekton as their orchestration engine, running concurrently in isolated namespaces on shared cluster infrastructure. + +The pipeline each team built: + +``` +Kafka (raw events) + → Tekton Task: validate-events (schema check, discard invalids) + → Tekton Task: route-and-enrich (type routing, metadata enrichment) + → Kafka (typed topics: hospital_admissions, clinic_visits, ...) + → Spark analytics job + → Database (PostgreSQL / TimescaleDB / InfluxDB / ClickHouse / MySQL) + → Phase 3 features (alerting, ML, dashboards) +``` + +--- + +## Prerequisites + +- Access to a NERC OpenShift project (namespace) +- `oc` CLI installed and logged in +- Cluster-admin or operator-install permissions for the Tekton operator (one-time, done by NERC staff or PI) + +--- + +## Step 1: Install the Tekton Operator + +Tekton is installed cluster-wide via OperatorHub. If it is already installed on your NERC cluster, skip this step. + +```yaml +# operator-subscription.yaml +apiVersion: operators.coreos.com/v1alpha1 +kind: Subscription +metadata: + name: tekton-pipelines-operator + namespace: openshift-operators +spec: + channel: latest + installPlanApproval: Automatic + name: tektoncd-operator + source: community-operators + sourceNamespace: openshift-marketplace +``` + +```bash +oc apply -f operator-subscription.yaml +# Verify the operator is running +oc get pods -n tekton-pipelines +``` + +Once installed, all namespaces on the cluster can create `Task`, `Pipeline`, and `PipelineRun` resources without any additional setup. + +--- + +## Step 2: Create a ServiceAccount with RBAC + +Each team (or project) needs a ServiceAccount that Tekton uses to run tasks. The example below scopes permissions to what a typical data pipeline task needs: reading Secrets, creating Pods, and managing ConfigMaps. + +```yaml +# serviceaccount.yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: pipeline-sa + namespace: my-namespace +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: pipeline-role + namespace: my-namespace +rules: + - apiGroups: [""] + resources: ["pods", "pods/log", "secrets", "configmaps"] + verbs: ["get", "list", "watch", "create", "delete"] + - apiGroups: ["tekton.dev"] + resources: ["taskruns", "pipelineruns"] + verbs: ["get", "list", "watch", "create", "delete"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: pipeline-rolebinding + namespace: my-namespace +subjects: + - kind: ServiceAccount + name: pipeline-sa + namespace: my-namespace +roleRef: + kind: Role + name: pipeline-role + apiGroup: rbac.authorization.k8s.io +``` + +```bash +oc apply -f serviceaccount.yaml +``` + +--- + +## Step 3: Define Tasks + +A `Task` is a single unit of work — a script that runs inside a container. In the DS-551 pipeline, each task reads from a Kafka topic, processes events, and publishes results to another topic. + +```yaml +# task-validate.yaml +apiVersion: tekton.dev/v1 +kind: Task +metadata: + name: validate-events + namespace: my-namespace +spec: + params: + - name: kafka-bootstrap + type: string + - name: input-topic + type: string + - name: output-topic + type: string + steps: + - name: validate + image: python:3.11-slim + script: | + #!/usr/bin/env sh + pip install kafka-python-ng --target=/tmp/pkgs --quiet + PYTHONPATH=/tmp/pkgs python3 - <<'PYEOF' + import sys; sys.path.insert(0, '/tmp/pkgs') + from kafka import KafkaConsumer, KafkaProducer + import json, logging + + logging.basicConfig(level=logging.INFO) + + consumer = KafkaConsumer( + '$(params.input-topic)', + bootstrap_servers='$(params.kafka-bootstrap)', + group_id='my-pipeline-consumer', + auto_offset_reset='earliest', + value_deserializer=lambda m: json.loads(m.decode('utf-8')) + ) + producer = KafkaProducer( + bootstrap_servers='$(params.kafka-bootstrap)', + value_serializer=lambda m: json.dumps(m).encode('utf-8') + ) + + for msg in consumer: + event = msg.value + if all(k in event for k in ['id', 'timestamp', 'type']): + producer.send('$(params.output-topic)', event) + else: + logging.warning(f"Invalid event discarded: {event}") + + producer.flush() + PYEOF +``` + +```bash +oc apply -f task-validate.yaml +``` + +--- + +## Step 4: Define a Pipeline + +A `Pipeline` chains tasks together using `runAfter` to express dependencies. Parameters flow from the PipelineRun down into each task. + +```yaml +# pipeline.yaml +apiVersion: tekton.dev/v1 +kind: Pipeline +metadata: + name: data-pipeline + namespace: my-namespace +spec: + params: + - name: kafka-bootstrap + - name: raw-topic + - name: valid-topic + tasks: + - name: validate + taskRef: + name: validate-events + params: + - name: kafka-bootstrap + value: $(params.kafka-bootstrap) + - name: input-topic + value: $(params.raw-topic) + - name: output-topic + value: $(params.valid-topic) + - name: enrich + taskRef: + name: route-and-enrich + runAfter: + - validate + params: + - name: kafka-bootstrap + value: $(params.kafka-bootstrap) + - name: input-topic + value: $(params.valid-topic) +``` + +```bash +oc apply -f pipeline.yaml +``` + +--- + +## Step 5: Run the Pipeline + +A `PipelineRun` is a concrete execution of a pipeline with specific parameter values. You can trigger it manually or via a CronJob. + +```yaml +# pipelinerun.yaml +apiVersion: tekton.dev/v1 +kind: PipelineRun +metadata: + name: data-pipeline-run-001 + namespace: my-namespace +spec: + pipelineRef: + name: data-pipeline + taskRunSpecs: + - pipelineTaskName: validate + serviceAccountName: pipeline-sa + params: + - name: kafka-bootstrap + value: kafka.my-namespace.svc.cluster.local:9092 + - name: raw-topic + value: my-project.raw + - name: valid-topic + value: my-project.valid +``` + +```bash +oc apply -f pipelinerun.yaml + +# Watch execution +oc get pipelineruns -n my-namespace -w + +# View logs from a specific task step +oc logs -f -n my-namespace +``` + +--- + +## Step 6: Schedule Recurring Runs (Optional) + +In DS-551, student pipelines ran on a schedule using a Kubernetes CronJob that applied a new PipelineRun manifest on each tick. + +```yaml +apiVersion: batch/v1 +kind: CronJob +metadata: + name: pipeline-scheduler + namespace: my-namespace +spec: + schedule: "0 */6 * * *" # every 6 hours + jobTemplate: + spec: + template: + spec: + serviceAccountName: pipeline-sa + containers: + - name: trigger + image: bitnami/kubectl:latest + command: + - /bin/sh + - -c + - | + kubectl apply -f - < -n my-namespace + +# Optional: forward Tekton dashboard locally +oc port-forward -n tekton-pipelines svc/tekton-dashboard 9097:9097 +# Open http://localhost:9097 +``` + +--- + +## Lessons Learned on NERC (Spring 2026) + +These are practical notes from running 12 concurrent student pipelines on NERC OpenShift for a full semester. + +**What worked well** +- The Tekton operator installed cleanly cluster-wide; teams had zero operator-level setup to do in their own namespaces +- Isolating each team in their own namespace meant a broken pipeline in one team had no blast radius on others +- Passing Kafka bootstrap servers and topic names as Pipeline params (not hardcoded in Tasks) made the same Task reusable across all 12 teams +- CronJob-triggered PipelineRuns worked reliably for batch analytics that needed to run every 6 hours + +**Watch out for** +- **ServiceAccount scope:** Tasks run as the ServiceAccount specified in the PipelineRun — if the SA lacks permissions to read a Secret or create a Pod, the task fails silently or with a confusing RBAC error. Define the Role carefully upfront. +- **Consumer group collisions:** If two PipelineRuns share the same Kafka consumer group ID, the second run will compete for offsets with the first. Use `generateName` on PipelineRuns and include a run ID in the consumer group name if you need concurrent runs. +- **Image pull time:** Tasks using `python:3.11-slim` and installing packages at runtime (via `pip install`) add 30–60 seconds of cold start per run. For production-frequency pipelines, bake dependencies into a custom image. +- **Task timeouts:** The default Tekton task timeout is 1 hour. Long-running Spark jobs or large Kafka backlogs can exceed this. Set `timeout` explicitly in the Pipeline spec. +- **PipelineRun accumulation:** Old PipelineRun objects are not garbage collected automatically. Add a cleanup CronJob or set `ttlSecondsAfterCompletion` to avoid namespace clutter. + +--- + +## Reference Architecture (DS-551 Spring 2026) + +``` +NERC OpenShift Cluster +├── ds551-2026-spring-9ab13b ← shared infrastructure namespace +│ ├── Kafka (Strimzi, per-team brokers) +│ └── Event Generator (Deployment — emits synthetic health events) +│ +├── ds551-2026-spring- ← one namespace per student team (×12) +│ ├── ServiceAccount + RBAC +│ ├── Tekton Tasks (validate-events, route-and-enrich) +│ ├── Pipeline (validate → enrich) +│ ├── CronJob (triggers PipelineRun every 6h) +│ ├── Spark analytics CronJob +│ ├── Database (PostgreSQL / TimescaleDB / InfluxDB / MySQL / ClickHouse) +│ └── Phase 3 services (alerting, ML CronJobs, dashboards) +``` + +12 teams ran this architecture concurrently on NERC for the full Spring 2026 semester, processing a continuous stream of synthetic epidemiological events end-to-end from Kafka through Tekton into persistent storage and Phase 3 ML features. + +--- + +## Further Reading + +- [Tekton Pipelines documentation](https://tekton.dev/docs/pipelines/) +- [Strimzi Kafka Operator on OpenShift](https://strimzi.io/docs/operators/latest/overview.html) +- [NERC OpenShift documentation](https://nerc-project.github.io/nerc-docs/) +- DS-551 course repository: `github.com/langd0n-classes/data-eng-at-scale` diff --git a/mkdocs.yaml b/mkdocs.yaml index 742a43761..f9b3fb3ab 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -113,6 +113,8 @@ nav: - Serverless Computing: - Understanding OpenShift Serverless: openshift/applications/understanding-openshift-serverless.md - "Knative Serving User Guide: Deploying and Managing Serverless Workloads": openshift/applications/knative-serving.md + - Pipeline Orchestration: + - Using Tekton Pipelines on NERC OpenShift: openshift/applications/tekton-pipelines.md - Set up Domain Name for Your Application: openshift/domain-name-system/domain-name-for-your-application.md - GPUs: - Introduction to GPUs on NERC OpenShift: openshift/gpus/intro-to-gpus-on-nerc-ocp.md