Skip to content

Commit 34bd316

Browse files
authored
feat (chart): add Langfuse retention management with ClickHouse TTL and hard delete (#259)
This pull request introduces support for automated data retention for Langfuse traces in ClickHouse, targeting setups that do not use Langfuse Enterprise's built-in data retention. It adds Helm chart options, Kubernetes CronJobs for both TTL enforcement and hard deletes, as well as shared environment configuration for ClickHouse access. The documentation is also updated to explain these new features. **Langfuse Trace Retention Automation** * Added a new `langfuseRetention` configuration section in `values.yaml` to enable and customize automatic trace retention, including TTL and optional hard deletion, with customizable schedules and ClickHouse table/column mapping. * Introduced two new Kubernetes CronJobs: * One to apply idempotent `ALTER TABLE ... MODIFY TTL` statements for automatic data expiration. * Another (optional) CronJob for deterministic hard deletes using `ALTER TABLE ... DELETE WHERE ...`, with mutation sync settings. * Created a shared Helm template (`rag.langfuseRetentionClickhouseEnv`) to inject consistent ClickHouse connection and retention settings into the CronJobs' environment. **Documentation** * Updated the `README.md` to document the new trace retention options, configuration examples, and operational notes for non-Enterprise users.
1 parent 3f9b6ea commit 34bd316

5 files changed

Lines changed: 297 additions & 0 deletions

File tree

infrastructure/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,33 @@ For local development you can let Tilt generate Langfuse init secrets automatica
188188
- Tilt runs Kustomize on `infrastructure/kustomize/langfuse` and applies the resulting `langfuse-init-secrets` (hash disabled) before Helm resources.
189189
- This is dev-only. For production, create/manage secrets with your secret manager and set `secretKeyRef.name` in `values.yaml` to your managed secret.
190190

191+
**Langfuse Trace Retention via ClickHouse TTL (without Enterprise)**
192+
If you want automatic deletion (for example after 1 year) without Langfuse Enterprise data-retention management, enable the chart-level retention CronJob:
193+
194+
```yaml
195+
langfuseRetention:
196+
enabled: true
197+
retentionDays: 365
198+
schedule: "15 */6 * * *"
199+
hardDelete:
200+
enabled: true
201+
schedule: "30 3 * * *"
202+
mutationSync: 0
203+
clickhouse:
204+
database: "default" # set this to the same DB your Langfuse deployment uses
205+
onCluster: false # true only for clustered ClickHouse setups
206+
clusterName: "default"
207+
```
208+
209+
Notes:
210+
- ClickHouse connection/auth for retention jobs is taken from `langfuse.clickhouse.*` (same source as Langfuse itself).
211+
- Make sure `langfuseRetention.clickhouse.database` matches your Langfuse ClickHouse database, not just the chart default.
212+
- Set `langfuseRetention.clickhouse.onCluster=true` only when your ClickHouse deployment is clustered and `clusterName` exists.
213+
- The CronJob applies idempotent `ALTER TABLE ... MODIFY TTL` statements on Langfuse tables (`traces`, `observations`, `scores`).
214+
- If `hardDelete.enabled=true`, an additional CronJob executes deterministic `ALTER TABLE ... DELETE WHERE ...` mutations.
215+
- Deletion is then handled by ClickHouse background merges (not instant at the exact cutoff timestamp).
216+
- Avoid applying TTL blindly to every table. Some tables are views/metadata and should not be retention-trimmed.
217+
191218
### 1.2 Qdrant
192219

193220
The deployment of the Qdrant can be disabled by setting the following value in the helm-chart:

infrastructure/rag/templates/_helpers.tpl

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,34 @@
7878
{{- toYaml $data -}}
7979
{{- end }}
8080
{{- end -}}
81+
82+
{{/* Shared ClickHouse env for Langfuse retention CronJobs. */}}
83+
{{- define "rag.langfuseRetentionClickhouseEnv" -}}
84+
{{- $chHost := default (printf "%s-clickhouse" .Release.Name | trunc 63 | trimSuffix "-") .Values.langfuse.clickhouse.host -}}
85+
{{- $chUser := default "default" .Values.langfuse.clickhouse.auth.username -}}
86+
{{- $chPasswordSecretName := default (printf "%s-clickhouse" .Release.Name | trunc 63 | trimSuffix "-") .Values.langfuse.clickhouse.auth.existingSecret -}}
87+
{{- $chPasswordKey := .Values.langfuse.clickhouse.auth.existingSecretKey -}}
88+
{{- $chNativePort := default 9000 .Values.langfuse.clickhouse.nativePort -}}
89+
- name: CLICKHOUSE_HOST
90+
value: {{ $chHost | quote }}
91+
- name: CLICKHOUSE_PORT
92+
value: {{ $chNativePort | quote }}
93+
- name: CLICKHOUSE_USER
94+
value: {{ $chUser | quote }}
95+
- name: CLICKHOUSE_DATABASE
96+
value: {{ .Values.langfuseRetention.clickhouse.database | quote }}
97+
- name: CLICKHOUSE_ON_CLUSTER
98+
value: {{ ternary "true" "false" .Values.langfuseRetention.clickhouse.onCluster | quote }}
99+
- name: CLICKHOUSE_CLUSTER_NAME
100+
value: {{ .Values.langfuseRetention.clickhouse.clusterName | quote }}
101+
- name: RETENTION_DAYS
102+
value: {{ .Values.langfuseRetention.retentionDays | quote }}
103+
- name: CLICKHOUSE_PASSWORD_LITERAL
104+
value: {{ .Values.langfuse.clickhouse.auth.password | quote }}
105+
- name: CLICKHOUSE_PASSWORD
106+
valueFrom:
107+
secretKeyRef:
108+
name: {{ $chPasswordSecretName | quote }}
109+
key: {{ default "CLICKHOUSE_PASSWORD" $chPasswordKey | quote }}
110+
optional: true
111+
{{- end -}}
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
{{- if and .Values.features.langfuse.enabled .Values.langfuseRetention.enabled }}
2+
{{- $retentionImage := printf "%s:%s" .Values.langfuseRetention.image.repository .Values.langfuseRetention.image.tag -}}
3+
apiVersion: batch/v1
4+
kind: CronJob
5+
metadata:
6+
name: {{ printf "%s-langfuse-retention" .Release.Name | trunc 63 | trimSuffix "-" }}
7+
labels:
8+
app.kubernetes.io/name: rag
9+
app.kubernetes.io/instance: {{ .Release.Name }}
10+
spec:
11+
schedule: {{ .Values.langfuseRetention.schedule | quote }}
12+
concurrencyPolicy: Forbid
13+
successfulJobsHistoryLimit: 1
14+
failedJobsHistoryLimit: 3
15+
jobTemplate:
16+
spec:
17+
template:
18+
metadata:
19+
labels:
20+
app.kubernetes.io/name: rag
21+
app.kubernetes.io/instance: {{ .Release.Name }}
22+
spec:
23+
securityContext:
24+
runAsUser: {{ .Values.langfuseRetention.podSecurityContext.runAsUser }}
25+
runAsNonRoot: {{ .Values.langfuseRetention.podSecurityContext.runAsNonRoot }}
26+
{{- if .Values.shared.imagePullSecret }}
27+
imagePullSecrets:
28+
- name: {{ .Values.shared.imagePullSecret.name }}
29+
{{- end }}
30+
restartPolicy: OnFailure
31+
containers:
32+
- name: apply-clickhouse-ttl
33+
image: {{ $retentionImage | quote }}
34+
imagePullPolicy: {{ .Values.langfuseRetention.image.pullPolicy | quote }}
35+
securityContext:
36+
allowPrivilegeEscalation: {{ .Values.langfuseRetention.securityContext.allowPrivilegeEscalation }}
37+
{{- with .Values.langfuseRetention.resources }}
38+
resources:
39+
{{ toYaml . | nindent 16 }}
40+
{{- end }}
41+
command:
42+
- /bin/bash
43+
- -ec
44+
args:
45+
- |
46+
set -euo pipefail
47+
48+
if [ -z "${CLICKHOUSE_PASSWORD:-}" ] && [ -z "${CLICKHOUSE_PASSWORD_LITERAL:-}" ]; then
49+
echo "No ClickHouse password found. Check langfuse.clickhouse.auth settings and secret."
50+
exit 1
51+
fi
52+
export CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-${CLICKHOUSE_PASSWORD_LITERAL:-}}"
53+
unset CLICKHOUSE_PASSWORD_LITERAL
54+
55+
ON_CLUSTER_CLAUSE=""
56+
if [ "${CLICKHOUSE_ON_CLUSTER}" = "true" ]; then
57+
ON_CLUSTER_CLAUSE=" ON CLUSTER ${CLICKHOUSE_CLUSTER_NAME}"
58+
fi
59+
60+
TABLE_ROWS="$(cat <<'EOF_TABLES'
61+
{{- range .Values.langfuseRetention.clickhouse.tables }}
62+
{{ .name }} {{ .timestampColumn }}
63+
{{- end }}
64+
EOF_TABLES
65+
)"
66+
67+
IDENTIFIER_REGEX='^[A-Za-z_][A-Za-z0-9_]*$'
68+
69+
while IFS=$'\t' read -r table ts_col; do
70+
[ -z "${table}" ] && continue
71+
72+
if ! [[ "${table}" =~ ${IDENTIFIER_REGEX} ]]; then
73+
echo "Invalid table identifier: ${table}"
74+
exit 1
75+
fi
76+
if ! [[ "${ts_col}" =~ ${IDENTIFIER_REGEX} ]]; then
77+
echo "Invalid timestamp column identifier: ${ts_col}"
78+
exit 1
79+
fi
80+
81+
echo "Applying TTL=${RETENTION_DAYS}d to ${CLICKHOUSE_DATABASE}.${table} (${ts_col})"
82+
if ! clickhouse-client \
83+
--host "${CLICKHOUSE_HOST}" \
84+
--port "${CLICKHOUSE_PORT}" \
85+
--user "${CLICKHOUSE_USER}" \
86+
--query "ALTER TABLE ${CLICKHOUSE_DATABASE}.${table}${ON_CLUSTER_CLAUSE} MODIFY TTL ${ts_col} + toIntervalDay(${RETENTION_DAYS})"; then
87+
echo "Failed applying TTL on ${CLICKHOUSE_DATABASE}.${table}"
88+
exit 1
89+
fi
90+
done <<< "${TABLE_ROWS}"
91+
env:
92+
{{ include "rag.langfuseRetentionClickhouseEnv" . | nindent 16 }}
93+
{{- end }}
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
{{- if and .Values.features.langfuse.enabled .Values.langfuseRetention.hardDelete.enabled }}
2+
{{- $retentionImage := printf "%s:%s" .Values.langfuseRetention.image.repository .Values.langfuseRetention.image.tag -}}
3+
apiVersion: batch/v1
4+
kind: CronJob
5+
metadata:
6+
name: {{ printf "%s-langfuse-retention-delete" .Release.Name | trunc 63 | trimSuffix "-" }}
7+
labels:
8+
app.kubernetes.io/name: rag
9+
app.kubernetes.io/instance: {{ .Release.Name }}
10+
spec:
11+
schedule: {{ .Values.langfuseRetention.hardDelete.schedule | quote }}
12+
concurrencyPolicy: Forbid
13+
successfulJobsHistoryLimit: 1
14+
failedJobsHistoryLimit: 3
15+
jobTemplate:
16+
spec:
17+
template:
18+
metadata:
19+
labels:
20+
app.kubernetes.io/name: rag
21+
app.kubernetes.io/instance: {{ .Release.Name }}
22+
spec:
23+
securityContext:
24+
runAsUser: {{ .Values.langfuseRetention.podSecurityContext.runAsUser }}
25+
runAsNonRoot: {{ .Values.langfuseRetention.podSecurityContext.runAsNonRoot }}
26+
{{- if .Values.shared.imagePullSecret }}
27+
imagePullSecrets:
28+
- name: {{ .Values.shared.imagePullSecret.name }}
29+
{{- end }}
30+
restartPolicy: OnFailure
31+
containers:
32+
- name: delete-expired-rows
33+
image: {{ $retentionImage | quote }}
34+
imagePullPolicy: {{ .Values.langfuseRetention.image.pullPolicy | quote }}
35+
securityContext:
36+
allowPrivilegeEscalation: {{ .Values.langfuseRetention.securityContext.allowPrivilegeEscalation }}
37+
{{- with .Values.langfuseRetention.resources }}
38+
resources:
39+
{{ toYaml . | nindent 16 }}
40+
{{- end }}
41+
command:
42+
- /bin/bash
43+
- -ec
44+
args:
45+
- |
46+
set -euo pipefail
47+
48+
if [ -z "${CLICKHOUSE_PASSWORD:-}" ] && [ -z "${CLICKHOUSE_PASSWORD_LITERAL:-}" ]; then
49+
echo "No ClickHouse password found. Check langfuse.clickhouse.auth settings and secret."
50+
exit 1
51+
fi
52+
export CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-${CLICKHOUSE_PASSWORD_LITERAL:-}}"
53+
unset CLICKHOUSE_PASSWORD_LITERAL
54+
55+
ON_CLUSTER_CLAUSE=""
56+
if [ "${CLICKHOUSE_ON_CLUSTER}" = "true" ]; then
57+
ON_CLUSTER_CLAUSE=" ON CLUSTER ${CLICKHOUSE_CLUSTER_NAME}"
58+
fi
59+
60+
TABLE_ROWS="$(cat <<'EOF_TABLES'
61+
{{- range .Values.langfuseRetention.clickhouse.tables }}
62+
{{ .name }} {{ .timestampColumn }}
63+
{{- end }}
64+
EOF_TABLES
65+
)"
66+
67+
CUTOFF_UNIX="$(( $(date -u +%s) - RETENTION_DAYS * 86400 ))"
68+
IDENTIFIER_REGEX='^[A-Za-z_][A-Za-z0-9_]*$'
69+
70+
while IFS=$'\t' read -r table ts_col; do
71+
[ -z "${table}" ] && continue
72+
73+
if ! [[ "${table}" =~ ${IDENTIFIER_REGEX} ]]; then
74+
echo "Invalid table identifier: ${table}"
75+
exit 1
76+
fi
77+
if ! [[ "${ts_col}" =~ ${IDENTIFIER_REGEX} ]]; then
78+
echo "Invalid timestamp column identifier: ${ts_col}"
79+
exit 1
80+
fi
81+
82+
echo "Deleting rows older than ${RETENTION_DAYS}d from ${CLICKHOUSE_DATABASE}.${table} (${ts_col})"
83+
if ! clickhouse-client \
84+
--host "${CLICKHOUSE_HOST}" \
85+
--port "${CLICKHOUSE_PORT}" \
86+
--user "${CLICKHOUSE_USER}" \
87+
--query "ALTER TABLE ${CLICKHOUSE_DATABASE}.${table}${ON_CLUSTER_CLAUSE} DELETE WHERE ${ts_col} < toDateTime(${CUTOFF_UNIX}) SETTINGS mutations_sync = ${MUTATION_SYNC}"; then
88+
echo "Failed deleting expired rows from ${CLICKHOUSE_DATABASE}.${table}"
89+
exit 1
90+
fi
91+
done <<< "${TABLE_ROWS}"
92+
env:
93+
- name: MUTATION_SYNC
94+
value: {{ .Values.langfuseRetention.hardDelete.mutationSync | quote }}
95+
{{ include "rag.langfuseRetentionClickhouseEnv" . | nindent 16 }}
96+
{{- end }}

infrastructure/rag/values.yaml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -734,6 +734,56 @@ langfuse:
734734
name: ""
735735
key: ""
736736

737+
# Optional: enforce a ClickHouse TTL for Langfuse traces without Enterprise data retention management.
738+
# This runs as a CronJob and applies idempotent ALTER TABLE ... MODIFY TTL commands.
739+
langfuseRetention:
740+
enabled: false
741+
retentionDays: 365
742+
schedule: "15 */6 * * *"
743+
podSecurityContext:
744+
runAsUser: 1001
745+
runAsNonRoot: true
746+
securityContext:
747+
allowPrivilegeEscalation: false
748+
# Optional resources for both retention CronJobs.
749+
# Example:
750+
# resources:
751+
# requests:
752+
# cpu: 100m
753+
# memory: 128Mi
754+
# limits:
755+
# cpu: 500m
756+
# memory: 512Mi
757+
resources: {}
758+
# Optional deterministic deletion in addition to TTL.
759+
# Uses ALTER TABLE ... DELETE WHERE ... and can run nightly.
760+
hardDelete:
761+
enabled: false
762+
schedule: "30 3 * * *"
763+
# ClickHouse mutations_sync setting:
764+
# 0 = async (default), 1 = wait for local completion, 2 = wait for all replicas.
765+
mutationSync: 0
766+
image:
767+
repository: "bitnamilegacy/clickhouse"
768+
tag: "25.2.1-debian-12-r0"
769+
pullPolicy: IfNotPresent
770+
clickhouse:
771+
# Connection/auth are taken from langfuse.clickhouse.*.
772+
# Align this with the database Langfuse actually uses in ClickHouse.
773+
database: "default"
774+
# Set to true only for clustered ClickHouse deployments where clusterName exists.
775+
# Keep false for single-node/non-clustered deployments.
776+
onCluster: false
777+
clusterName: "default"
778+
tables:
779+
# timestampColumn should be a Date/DateTime/DateTime64 column in the target table.
780+
- name: "traces"
781+
timestampColumn: "timestamp"
782+
- name: "observations"
783+
timestampColumn: "event_ts"
784+
- name: "scores"
785+
timestampColumn: "timestamp"
786+
737787
minio:
738788
image:
739789
repository: bitnamilegacy/minio

0 commit comments

Comments
 (0)