本文档描述 DataJump 平台的部署架构设计,采用 Kubernetes 容器化部署,支持多环境管理、高可用、弹性伸缩等企业级特性。
┌─────────────────────────────────────────────────────────────────────────────┐
│ 用户访问层 │
│ ┌─────────────────────────────┐ │
│ │ 负载均衡 (SLB) │ │
│ └─────────────┬───────────────┘ │
└──────────────────────────────────┼──────────────────────────────────────────┘
│
┌──────────────────────────────────▼──────────────────────────────────────────┐
│ Kubernetes 集群 │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Ingress Controller │ │
│ │ (Nginx Ingress / Traefik) │ │
│ └────────────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────▼───────────────────────────────────────┐ │
│ │ API Gateway │ │
│ │ (Kong / Spring Cloud Gateway) │ │
│ └────────────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────┬──────────┬──────────┼──────────┬──────────┬────────────────┐ │
│ │ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │ │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────────┐ │ │
│ │Web │ │调度 │ │集成 │ │开发 │ │元数据│ │运维监控│ │ │
│ │前端│ │服务 │ │服务 │ │服务 │ │服务 │ │ 服务 │ │ │
│ └────┘ └────┘ └────┘ └────┘ └────┘ └────────┘ │ │
│ │ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ Worker 节点池 │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Worker-1 │ │Worker-2 │ │Worker-3 │ │Worker-N │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ 基础设施层 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ MySQL │ │ Redis │ │ Kafka │ │ ES │ │ MinIO │ │
│ │ Cluster │ │ Cluster │ │ Cluster │ │ Cluster │ │ Cluster │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
| 环境 | 用途 | 资源配置 | 数据 |
|---|---|---|---|
| DEV | 开发调试 | 最小化配置 | 模拟数据 |
| TEST | 功能测试 | 中等配置 | 测试数据 |
| STAGING | 预发验证 | 接近生产 | 生产数据子集 |
| PROD | 生产环境 | 完整配置 | 真实数据 |
# dev namespace
apiVersion: v1
kind: Namespace
metadata:
name: datajump-dev
labels:
env: dev
---
# test namespace
apiVersion: v1
kind: Namespace
metadata:
name: datajump-test
labels:
env: test
---
# staging namespace
apiVersion: v1
kind: Namespace
metadata:
name: datajump-staging
labels:
env: staging
---
# prod namespace
apiVersion: v1
kind: Namespace
metadata:
name: datajump-prod
labels:
env: prod# ConfigMap - 应用配置
apiVersion: v1
kind: ConfigMap
metadata:
name: datajump-config
namespace: datajump-prod
data:
application.yml: |
spring:
profiles:
active: prod
datasource:
url: jdbc:mysql://mysql-cluster:3306/datajump
hikari:
maximum-pool-size: 50
redis:
cluster:
nodes: redis-0:6379,redis-1:6379,redis-2:6379
kafka:
bootstrap-servers: kafka-0:9092,kafka-1:9092,kafka-2:9092
scheduler:
master:
count: 3
worker:
min-count: 5
max-count: 50
---
# Secret - 敏感配置
apiVersion: v1
kind: Secret
metadata:
name: datajump-secrets
namespace: datajump-prod
type: Opaque
stringData:
db-password: ${DB_PASSWORD}
redis-password: ${REDIS_PASSWORD}
jwt-secret: ${JWT_SECRET}# 调度服务 Master
apiVersion: apps/v1
kind: Deployment
metadata:
name: scheduler-master
namespace: datajump-prod
spec:
replicas: 3
selector:
matchLabels:
app: scheduler-master
template:
metadata:
labels:
app: scheduler-master
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: scheduler-master
topologyKey: kubernetes.io/hostname
containers:
- name: scheduler-master
image: datajump/scheduler-master:v1.0.0
ports:
- containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: datajump-secrets
key: db-password
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
volumeMounts:
- name: config
mountPath: /app/config
volumes:
- name: config
configMap:
name: datajump-config
---
# 调度服务 Worker (HPA 自动伸缩)
apiVersion: apps/v1
kind: Deployment
metadata:
name: scheduler-worker
namespace: datajump-prod
spec:
replicas: 5
selector:
matchLabels:
app: scheduler-worker
template:
metadata:
labels:
app: scheduler-worker
spec:
containers:
- name: scheduler-worker
image: datajump/scheduler-worker:v1.0.0
ports:
- containerPort: 8081
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "8Gi"
cpu: "4000m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
initialDelaySeconds: 30
periodSeconds: 10
---
# Worker HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scheduler-worker-hpa
namespace: datajump-prod
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scheduler-worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80# 调度服务
apiVersion: v1
kind: Service
metadata:
name: scheduler-master
namespace: datajump-prod
spec:
selector:
app: scheduler-master
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
# 前端服务
apiVersion: v1
kind: Service
metadata:
name: datajump-web
namespace: datajump-prod
spec:
selector:
app: datajump-web
ports:
- port: 80
targetPort: 80
type: ClusterIPapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: datajump-ingress
namespace: datajump-prod
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- datajump.example.com
secretName: datajump-tls
rules:
- host: datajump.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: datajump-web
port:
number: 80
- path: /api
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 8080# 使用 MySQL Operator
apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
name: mysql-cluster
namespace: datajump-prod
spec:
instances: 3
router:
instances: 2
tlsUseSelfSigned: true
secretName: mysql-root-secret
datadirVolumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd# 使用 Redis Operator
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
name: redis-cluster
namespace: datajump-prod
spec:
clusterSize: 3
clusterVersion: v7
persistenceEnabled: true
kubernetesConfig:
image: redis:7.0-alpine
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi# 使用 Strimzi Operator
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka-cluster
namespace: datajump-prod
spec:
kafka:
version: 3.5.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
class: fast-ssd
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 20Gi
class: fast-ssd# 使用 ECK Operator
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: es-cluster
namespace: datajump-prod
spec:
version: 8.10.0
nodeSets:
- name: master
count: 3
config:
node.roles: ["master"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: fast-ssd
- name: data
count: 5
config:
node.roles: ["data", "ingest"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: fast-ssd# ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: datajump-monitor
namespace: datajump-prod
spec:
selector:
matchLabels:
app.kubernetes.io/part-of: datajump
endpoints:
- port: metrics
interval: 15s
path: /actuator/prometheus{
"dashboard": {
"title": "DataJump Overview",
"panels": [
{
"title": "Task Success Rate",
"type": "gauge",
"targets": [
{
"expr": "sum(rate(task_success_total[5m])) / sum(rate(task_total[5m])) * 100"
}
]
},
{
"title": "Running Tasks",
"type": "stat",
"targets": [
{
"expr": "sum(scheduler_running_tasks)"
}
]
},
{
"title": "Worker CPU Usage",
"type": "timeseries",
"targets": [
{
"expr": "avg(rate(container_cpu_usage_seconds_total{pod=~\"scheduler-worker.*\"}[5m])) * 100"
}
]
}
]
}
}# Fluent Bit DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: datajump-prod
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.1
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: config
configMap:
name: fluent-bit-config# .gitlab-ci.yml
stages:
- build
- test
- security
- deploy
variables:
DOCKER_REGISTRY: registry.example.com
IMAGE_TAG: $CI_COMMIT_SHORT_SHA
build:
stage: build
image: docker:24-dind
script:
- docker build -t $DOCKER_REGISTRY/datajump/$SERVICE_NAME:$IMAGE_TAG .
- docker push $DOCKER_REGISTRY/datajump/$SERVICE_NAME:$IMAGE_TAG
test:
stage: test
image: maven:3.9-eclipse-temurin-17
script:
- mvn test
artifacts:
reports:
junit: target/surefire-reports/*.xml
security-scan:
stage: security
image: aquasec/trivy:latest
script:
- trivy image $DOCKER_REGISTRY/datajump/$SERVICE_NAME:$IMAGE_TAG
deploy-dev:
stage: deploy
image: bitnami/kubectl:latest
environment:
name: dev
script:
- kubectl set image deployment/$SERVICE_NAME $SERVICE_NAME=$DOCKER_REGISTRY/datajump/$SERVICE_NAME:$IMAGE_TAG -n datajump-dev
only:
- develop
deploy-prod:
stage: deploy
image: bitnami/kubectl:latest
environment:
name: production
script:
- kubectl set image deployment/$SERVICE_NAME $SERVICE_NAME=$DOCKER_REGISTRY/datajump/$SERVICE_NAME:$IMAGE_TAG -n datajump-prod
when: manual
only:
- main# Chart.yaml
apiVersion: v2
name: datajump
description: DataJump - One-stop Big Data Platform
version: 1.0.0
appVersion: "1.0.0"
# values.yaml
replicaCount:
master: 3
worker: 5
api: 3
image:
registry: registry.example.com
pullPolicy: IfNotPresent
resources:
master:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
worker:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "8Gi"
cpu: "4000m"
autoscaling:
enabled: true
minReplicas: 5
maxReplicas: 50
targetCPUUtilizationPercentage: 70
mysql:
enabled: true
architecture: replication
primary:
persistence:
size: 100Gi
redis:
enabled: true
architecture: cluster
cluster:
nodes: 6
kafka:
enabled: true
replicaCount: 3# 定时备份 CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-backup
namespace: datajump-prod
spec:
schedule: "0 2 * * *" # 每天凌晨2点
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mysql:8.0
command:
- /bin/sh
- -c
- |
mysqldump -h mysql-cluster -u root -p$MYSQL_ROOT_PASSWORD --all-databases | \
gzip > /backup/datajump-$(date +%Y%m%d).sql.gz
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-root-secret
key: password
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure# Pod 跨可用区调度
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: scheduler-masterapiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: datajump-network-policy
namespace: datajump-prod
spec:
podSelector:
matchLabels:
app.kubernetes.io/part-of: datajump
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: datajump-prod
- podSelector:
matchLabels:
app: api-gateway
egress:
- to:
- namespaceSelector:
matchLabels:
name: datajump-prodapiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: datajump-restricted
spec:
privileged: false
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- configMap
- emptyDir
- projected
- secret
- downwardAPI
- persistentVolumeClaim