DryRun captures PostgreSQL planner statistics so EXPLAIN plans are
realistic without a connection to production. Those statistics include
most_common_vals and histogram_bounds, which are literal column
values lifted from your tables. For a users.email or users.phone
column, that is PII landing in .dryrun/history.db.
The masking policy is a small YAML file that names sensitive columns. DryRun reads it at capture time and writes NULL into the offending stats before anything touches disk.
Masking runs once, inside init or snapshot take. The masked form is
what lands in history.db. push and pull move bytes and do not
re-mask, so a missing or wrong policy at capture is a permanent leak
and recapture is the only fix.
By default a missing policy is a warning, so fresh checkouts work.
Projects with real PII should set require_masks = true in
dryrun.toml to turn that warning into a hard error.
The file is shared with fixturize.
fixturize uses the expr field to rewrite rows on extract. DryRun
ignores expr and uses the column set to NULL planner stats.
The rest of this document shows the small case first, two columns
hand-written, and then the automated path with fixturize analyze.
Suppose users.email and users.phone are the only sensitive columns
you care about. Drop this file at the repo root as
data-masking-policy.yml:
version: 1
databases:
dev:
columns:
users.email: { expr: "'user_' || id || '@masked.test'", tags: [pii] }
users.phone: { expr: "'+1-000-000-0000'", tags: [pii] }The dev key must match your DryRun database_id (see below). The
version field must be 1. expr is only read by fixturize, but it
is part of the shared schema, so leave it in even if you never run
fixturize.
Column keys can be table.column to match in any schema, or
schema.table.column to qualify.
DryRun resolves the policy file in three ways, highest priority first:
--masks-file <path>ondryrun init.masks_filein the activedryrun.tomlprofile.- Auto-discovery, walking up from the working directory looking for
data-masking-policy.ymland stopping at.git.
If the file sits at the repo root, auto-discovery is enough. To be explicit:
require_masks = true
[project]
id = "appdb"
[profiles.dev]
db_url = "${DATABASE_URL}"
masks_file = "data-masking-policy.yml"The profile's database_id defaults to the profile name (dev) and is
what DryRun uses to select a block inside the YAML. The block name in
the policy file must match it, so databases.dev rather than
databases.default.
dryrun --db "$DATABASE_URL" --profile dev initThe summary line names how many columns got blanked:
Captured schema: 24 tables, 3 views, 8 functions
Schema: .dryrun/schema.json
Planner: 24 tables, 41 indexes, 312 columns
Masked: 2 planner-stats columns
Activity: node=db-primary, 24 tables, 41 indexes
Confirm it on the history store directly:
sqlite3 .dryrun/history.db \
"SELECT column_name, most_common_vals FROM planner_column_stats
WHERE column_name IN ('email','phone');"Masked columns show NULL most_common_vals. Non-sensitive columns keep
their statistics, so plans on those columns stay realistic.
Regardless of policy, DryRun strips jsonb MCV values at capture as an
always-on backstop. histogram_bounds is not auto-stripped, so list
sensitive jsonb columns in the policy explicitly.
For a one-off opt-out, dryrun init --no-masks writes raw planner
stats to history.db. It is refused when require_masks = true. To
select a subset of columns by tag, group them under policies in the
YAML and pass --mask-policy <name>. Without it, every listed column
is masked.
Two columns are easy to type. A real schema has dozens, and the names
drift over time. fixturize analyze scans the live schema, flags
columns that look like PII by name and type, and emits a policy in the
same format.
fixturize analyze \
--connection "$DATABASE_URL" \
--yaml \
--output data-masking-policy.ymlUseful flags:
| Flag | Purpose |
|---|---|
--min-confidence low|medium|high |
Drop low-confidence guesses. Default low. |
--root users --depth 2 |
Limit the scan to tables reachable from a root table. |
--schema public |
Default schema for unqualified names. |
Run it once without --yaml to eyeball the report and confidence
levels:
fixturize analyze --connection "$DATABASE_URL" --min-confidence mediumThe generated file looks identical to the hand-written one above, with
one caveat. fixturize analyze always writes the block under
databases.default. DryRun looks up the block by your profile's
database_id and fails with an error if it does not find a match.
Rename default to your actual database_id. Multi-DB projects need
one entry per captured database, even if empty.
Automated heuristics miss domain-specific PII like a notes text
column or a metadata jsonb blob, and over-flag harmless lookups, so
the output is a starting point rather than ground truth.
Because both tools key off schema.table.column, the same file
protects test-data extraction in fixturize (via expr) and local
planner-stats capture in DryRun (via the column set).
| Symptom | Fix |
|---|---|
masks file has no entry for database_id "X" |
Rename the YAML block to match the profile's database_id. Multi-DB projects need one entry per captured database. |
no data-masking-policy.yml resolved warning |
Expected on fresh checkouts. To enforce, set require_masks = true or pass --masks-file. |
require_masks=true ... must exist |
Provide a file via any of the three resolution paths, or unset require_masks. |
SECURITY.mdcovers DryRun's security model and where masking sits in it.dryrun-readonly-role.sqlis the minimum-privilege role for capture.