Sensitive Data Filtering

v0.4.0 — Automatic classification and filtering of sensitive data before LLM calls.

Problem

When using cloud-based large LLMs, environment variables containing API keys, tokens, passwords, and database URLs must never be sent to the model. At the same time, safe context (locale, shell, hostname) is critical for accurate responses.

Solution

SensitiveDataFilter classifies every piece of data into three levels:

Level	Action	Examples
SAFE	Pass through	`LANG`, `SHELL`, `HOME`, `USER`, `PWD`, `HOSTNAME`
MASKED	Partially shown (first/last 2 chars)	`DATABASE_URL`, `REDIS_URL`, `SMTP_*`
BLOCKED	Completely removed	`API_KEY`, `SECRET`, `TOKEN`, `PASSWORD`

Value-based detection catches tokens even in non-standard variable names:

Pattern	Provider
`sk-[a-zA-Z0-9]{20,}`	OpenAI
`sk-ant-[a-zA-Z0-9]{20,}`	Anthropic
`ghp_[a-zA-Z0-9]{36}`	GitHub PAT
`gsk_[a-zA-Z0-9]{20,}`	Groq
`sk-or-v1-[a-zA-Z0-9]{20,}`	OpenRouter
`xox[bpsa]-[a-zA-Z0-9-]{20,}`	Slack

Automatic (default in v0.4)

from prellm import preprocess_and_execute

# sanitize=True is the default — secrets are filtered automatically
result = await preprocess_and_execute(
    query="Deploy to production",
    small_llm="ollama/bielik:7b",
    large_llm="openrouter/google/gemini-3-flash-preview",
    # sanitize=True,  # default
)

Disable for Development

result = await preprocess_and_execute(
    query="Debug local issue",
    sanitize=False,  # dev mode — nothing filtered
)

Custom Rules

result = await preprocess_and_execute(
    query="Deploy",
    sensitive_rules="my_rules.yaml",  # custom YAML
)

Direct API

from prellm.context.sensitive_filter import SensitiveDataFilter
from prellm.models import SensitivityLevel

filt = SensitiveDataFilter()

# Classify individual keys
filt.classify_key("OPENAI_API_KEY")    # → SensitivityLevel.BLOCKED
filt.classify_key("DATABASE_URL")      # → SensitivityLevel.MASKED
filt.classify_key("LANG")              # → SensitivityLevel.SAFE

# Classify values
filt.classify_value("sk-abc123def456ghi789")  # → BLOCKED (OpenAI pattern)
filt.classify_value("hello world")             # → SAFE

# Filter a dict
data = {"LANG": "pl_PL", "OPENAI_API_KEY": "sk-secret", "HOME": "/home/user"}
safe = filt.filter_dict(data)
# Sanitize free text
text = "Use key sk-1234567890abcdefghijklmnop for auth"
clean = filt.sanitize_text(text)
# Get report
report = filt.get_filter_report()
print(report.blocked_keys)   # ["OPENAI_API_KEY"]
print(report.safe_keys)      # ["LANG", "HOME"]

Configuration YAML

Default rules are in configs/sensitive_rules.yaml:

sensitive_keys:
  blocked:
    - "API_KEY"
    - "SECRET"
    - "TOKEN"
    - "PASSWORD"
    - "PRIVATE_KEY"
    - "CREDENTIAL"
    - "AUTH_KEY"
  masked:
    - "DATABASE_URL"
    - "REDIS_URL"
    - "SMTP_"
    - "MONGO_URI"
  safe:
    - "LANG"
    - "TERM"
    - "SHELL"
    - "HOME"
    - "USER"
    - "PWD"
    - "PATH"
    - "EDITOR"
    - "HOSTNAME"

sensitive_value_patterns:
  - "sk-[a-zA-Z0-9]{20,}"        # OpenAI
  - "sk-ant-[a-zA-Z0-9]{20,}"    # Anthropic
  - "ghp_[a-zA-Z0-9]{36}"        # GitHub PAT
  - "gsk_[a-zA-Z0-9]{20,}"       # Groq
  - "sk-or-v1-[a-zA-Z0-9]{20,}"  # OpenRouter

Custom Rules File

Create your own YAML to extend the defaults:

# my_rules.yaml
sensitive_keys:
  blocked:
    - "INTERNAL_SECRET"
    - "CORP_TOKEN"
  masked:
    - "LDAP_URL"
  safe:
    - "MY_SAFE_VAR"

sensitive_value_patterns:
  - "corp-[a-zA-Z0-9]{32}"  # corporate token format

result = await preprocess_and_execute(
    query="Deploy",
    sensitive_rules="my_rules.yaml",
)

Integration Points

The filter is applied at two points in the pipeline:

Context preparation (_prepare_context in core.py) — filters extra_context dict before it reaches the preprocessor
Executor input (ExecutorAgent.execute) — sanitize_text() on the final prompt before the large-LLM call

This means the small LLM (local) sees more context than the large LLM (cloud).

Related Docs

Persistent Context — full context layer
Session Persistence — export/import
CHANGELOG — v0.4.0 details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sensitive Data Filtering

Problem

Solution

Automatic (default in v0.4)

Disable for Development

Custom Rules

Direct API

Configuration YAML

Custom Rules File

Integration Points

Related Docs

FilesExpand file tree

sensitive-data.md

Latest commit

History

sensitive-data.md

File metadata and controls

Sensitive Data Filtering

Problem

Solution

Automatic (default in v0.4)

Disable for Development

Custom Rules

Direct API

Configuration YAML

Custom Rules File

Integration Points

Related Docs