Skip to content

feat: mask sensitive data inside objects and URLs in code variables#688

Open
ablaszkiewicz wants to merge 3 commits into
mainfrom
feat/code-variables-object-and-url-masking
Open

feat: mask sensitive data inside objects and URLs in code variables#688
ablaszkiewicz wants to merge 3 commits into
mainfrom
feat/code-variables-object-and-url-masking

Conversation

@ablaszkiewicz

@ablaszkiewicz ablaszkiewicz commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What changed

Hardens exception code_variables masking so secrets can no longer leak through untraversed objects, the repr() fallback, or URLs/DSNs.

  • Traverse custom objects — redact sensitive fields by their real attribute name instead of dumping the whole object.
  • Fail-closed serialization — nothing reaches the payload via a raw repr(); anything we can't safely decompose becomes a placeholder.
  • Scrub URL/DSN credentials from string values regardless of the surrounding name; added connection_string / dsn to the default patterns.
  • New toggle code_variables_mask_url_credentials (default True), wired through the constructor, module global, and per-context override (mirrors code_variables_mask_patterns).
  • Bounded everywhere — object traversal now honours the same depth/size caps as collections; URL scrubbing is a single linear regex (no catastrophic backtracking).

Before / after

Captured code variable Before After
PostgresSourceConfig(host="db", password="hunter2") — custom object …password='hunter2'… (whole object dumped via repr()) {"host": "db", "password": "‹redacted›", "__class__": "PostgresSourceConfig"}
"postgresql://user:hunter2@db:5432/app" emitted verbatim "postgresql://‹redacted›@db:5432/app"
"ssh://git@github.com/repo" — username, no password emitted verbatim unchanged (not a credential)
object whose __repr__ raises or isn't JSON-serializable raw repr() (only length-truncated) "‹TypeName›" placeholder
object with 10 000 attributes fully serialized "‹value too long›" (capped at 100, same as dict/list)
deeply nested / self-referential object could recurse unbounded capped at depth 25; cycles → "‹circular ref›"
mask_patterns=[] and URL scrubbing on n/a URLs still scrubbed — the two toggles are independent

‹redacted› = $$_posthog_redacted_based_on_masking_rules_$$, ‹value too long› = $$_posthog_value_too_long_$$.

Limitation

Blocklist masking can't catch a secret stored under an unrecognised name with no detectable shape (e.g. a bare password in a local named pw). Source context lines are intentionally left untouched.

Tests

Object traversal + nested context preservation, _safe_repr (redact-on-match / clean passthrough / broken __repr__ / too-long), URL scrubbing (multi-URL, @-in-password, IPv6, bare username, other schemes), the size/depth caps, the independent-toggle behaviour, the per-context override, and an end-to-end test mirroring the original leaked event.

🤖 Generated with Claude Code

Code variable masking previously only inspected dicts/lists/tuples/strings,
and fell back to a raw repr() on serialization failure. As a result, secrets
held as attributes of custom objects (e.g. a PostgresSourceConfig with a
`password` field) were emitted verbatim via the unmasked repr() path.

This hardens masking to be fail-closed:

- Traverse custom objects (dataclasses / objects with a populated __dict__)
  so sensitive fields are redacted by their real attribute name. This is both
  safer (a custom __repr__ can't relabel a field out of the mask) and
  higher-fidelity (only the sensitive field is redacted, surrounding context
  is kept).
- Replace the leaky repr() fallback with a fail-closed _safe_repr() that
  redacts the whole value when any masking rule matches, redacts when the
  repr is too long to scan, and emits a type-name placeholder when __repr__
  raises. json.dumps gets a default= net so no raw object can slip through.
- Scrub credentials embedded in URLs/DSNs (postgresql://user:pass@host) from
  string values regardless of the surrounding key name. Add `connection_string`
  to the default mask patterns.

Adds a `code_variables_mask_url_credentials` config option (default True),
wired through the constructor, module-level global, and per-context override,
mirroring code_variables_mask_patterns.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread posthog/test/test_exception_capture.py Fixed
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
posthog/exception_utils.py:1083-1096
**`mask_url_credentials` silently inert when `mask_patterns` is empty**

The early return `if not compiled_mask: return value` means URL credential scrubbing is bypassed entirely whenever `compiled_mask` is `None` — which happens when `mask_patterns=[]`. The same guard appears in `_serialize_variable_value` (`elif compiled_mask and mask_url_credentials:`), so a user who explicitly disables name-based masks but still expects URL credentials to be scrubbed gets no protection. The two features are advertised as independent toggles but share a single gate.

### Issue 2 of 2
posthog/test/test_exception_capture.py:984-1002
**Prefer `@pytest.mark.parametrize` for multi-case unit tests**

`test_redact_url_credentials` bundles four distinct input/output assertions in a single test body. Per the team convention, these cases should be expressed as separate parametrize entries so each case gets its own pass/fail signal and name. The same applies to `test_mask_url_credentials_can_be_toggled` (two cases: enabled vs disabled) and the inline assertions inside `test_compile_patterns_fast_path_and_regex_fallback`.

Reviews (1): Last reviewed commit: "feat: mask sensitive data inside objects..." | Re-trigger Greptile

Comment thread posthog/exception_utils.py
Comment thread posthog/test/test_exception_capture.py
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

posthog-python Compliance Report

Date: 2026-06-19 21:10:25 UTC
Duration: 540113ms

✅ All Tests Passed!

45/45 tests passed


Capture Tests

29/29 tests passed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 517ms
Format Validation.Event Has Uuid 10007ms
Format Validation.Event Has Lib Properties 10007ms
Format Validation.Distinct Id Is String 10007ms
Format Validation.Token Is Present 10006ms
Format Validation.Custom Properties Preserved 10007ms
Format Validation.Event Has Timestamp 10007ms
Retry Behavior.Retries On 503 18019ms
Retry Behavior.Does Not Retry On 400 12002ms
Retry Behavior.Does Not Retry On 401 10008ms
Retry Behavior.Respects Retry After Header 16013ms
Retry Behavior.Implements Backoff 32028ms
Retry Behavior.Retries On 500 16000ms
Retry Behavior.Retries On 502 16010ms
Retry Behavior.Retries On 504 16010ms
Retry Behavior.Max Retries Respected 32028ms
Deduplication.Generates Unique Uuids 9992ms
Deduplication.Preserves Uuid On Retry 16016ms
Deduplication.Preserves Uuid And Timestamp On Retry 23019ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 16004ms
Deduplication.No Duplicate Events In Batch 10002ms
Deduplication.Different Events Have Different Uuids 10006ms
Compression.Sends Gzip When Enabled 10007ms
Batch Format.Uses Proper Batch Structure 10007ms
Batch Format.Flush With No Events Sends Nothing 5005ms
Batch Format.Multiple Events Batched Together 10005ms
Error Handling.Does Not Retry On 403 12008ms
Error Handling.Does Not Retry On 413 10006ms
Error Handling.Retries On 408 14015ms

Feature_Flags Tests

16/16 tests passed

View Details
Test Status Duration
Request Payload.Request With Person Properties Device Id 9501ms
Request Payload.Flags Request Uses V2 Query Param 10006ms
Request Payload.Flags Request Hits Flags Path Not Decide 10007ms
Request Payload.Flags Request Omits Authorization Header 10006ms
Request Payload.Token In Flags Body Matches Init 10007ms
Request Payload.Groups Round Trip 10007ms
Request Payload.Groups Default To Empty Object 10006ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It 10007ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False 10007ms
Request Payload.Disable Geoip Omitted Defaults To False 10006ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key 10006ms
Request Lifecycle.No Flags Request On Init Alone 5003ms
Request Lifecycle.No Flags Request On Normal Capture 10507ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests 9511ms
Request Lifecycle.Mock Response Value Is Returned To Caller 10002ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event 10510ms

Comment thread posthog/test/test_exception_capture.py Fixed
@ablaszkiewicz ablaszkiewicz marked this pull request as ready for review June 19, 2026 21:00
@ablaszkiewicz ablaszkiewicz requested a review from a team as a code owner June 19, 2026 21:00
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "fix: comments" | Re-trigger Greptile

@ablaszkiewicz ablaszkiewicz requested review from a team, cat-ph and hpouillot June 19, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants