Skip to content

fix: reduce false positives and add global rule configuration#49

Open
satoridev01 wants to merge 1 commit intoParzivalHack:mainfrom
satoridev01:fix/reduce-false-positives-django
Open

fix: reduce false positives and add global rule configuration#49
satoridev01 wants to merge 1 commit intoParzivalHack:mainfrom
satoridev01:fix/reduce-false-positives-django

Conversation

@satoridev01
Copy link
Copy Markdown
Contributor

Summary

Introduces two new mechanisms for controlling rule behavior globally, then applies them to fix a class of false positives that surface when scanning mature Python codebases.

Problem

Scanning a large Django-based project produced 864 findings with -s HIGH. Manual verification showed that ~720 were false positives:

  • 453 from PY515/SHELL645/SHELL670 — flagging every re.compile() call as "code compilation"
  • 101 from PY102 — the taint engine firing on generic callable invocations
  • 78 from PY511/JSON612json.loads() flagged as dangerous deserialization
  • dozens more from AUTH711, CSRF747, SESS744, PATH813, etc. — firing on test fixtures and well-known safe patterns

Without filtering, scanning also revealed ~9,000 additional Low-severity findings from rules that flag every Python built-in (isinstance, super, len, f-strings, getattr, etc.).

Solution

1. [defaults] section in rules TOML

[defaults]
exclude_file_patterns = ["*tests*", "*fixtures*", "*testdata*", "*conftest*"]
disabled_rule_ids     = ["ISINSTANCE855", "LEN1101", "SUPER1128", ...]
  • exclude_file_patterns — glob list applied to all rules; eliminates the need to repeat exclude_file_pattern on every individual rule
  • disabled_rule_ids — completely disables noisy rules without deleting their definitions (easy to re-enable per project)

2. Per-rule exclude_pattern

Regex matched against the flagged line; suppresses the finding if it matches. Fixes rules that are correct in general but have well-known safe variants:

[[rule]]
id = "PY107"
ast_match = "Call(func.value.id=yaml, func.attr=load)"
# Safe when Loader=SafeLoader is explicitly passed
exclude_pattern = "Loader\\s*=\\s*(yaml\\.)?(Safe|Base)Loader"

Results on Django 6.1-alpha

Before After
Total findings (all severities) 864 (-s HIGH) 383
re.compile() false positives 453 0
yaml.load(SafeLoader) false positives 2 0
Test fixture credential false positives 9 0
Session fixation false positives 1 0
Python built-in noise rules ~9,000 (hidden by -s HIGH) 0
pickle.loads() true positives 57 → 8 (test files excluded) ✓ present
exec() true positives 3 ✓ 3

Files changed

  • src/pyspector/_rust_core/src/rules.rsDefaults struct, Rule::is_file_excluded(), exclude_pattern field
  • src/pyspector/_rust_core/src/analysis/mod.rs — apply disabled_rule_ids before scan
  • src/pyspector/_rust_core/src/analysis/ast_analysis.rs — use is_file_excluded() and exclude_pattern
  • src/pyspector/_rust_core/src/analysis/config_analysis.rs — same
  • src/pyspector/rules/built-in-rules.toml[defaults] section + per-rule fixes
  • tests/unit/test_false_positive_reductions.py — 26 tests (each fix has a suppression test + true-positive retention test)

Test plan

  • python -m pytest tests/unit/test_false_positive_reductions.py — 26 tests pass
  • Scan a Django project: pyspector scan /path/to/django — confirm re.compile(), yaml.load(SafeLoader), and test-fixture findings are gone
  • Confirm pickle.loads() and exec() findings still present in production code

Addresses a class of false positives identified through real-world scanning
of a large Python codebase (Django 6.1-alpha). Introduces two new mechanisms
for controlling rule behavior without modifying individual rule definitions.

## Changes

### New: `[defaults]` section in rules TOML
A single top-level section that applies to every rule:

```toml
[defaults]
exclude_file_patterns = ["*tests*", "*fixtures*", "*testdata*", "*conftest*"]
disabled_rule_ids = ["ISINSTANCE855", "LEN1101", "SUPER1128", ...]
```

- `exclude_file_patterns` — glob patterns excluded from ALL rules (replaces
  repeating `exclude_file_pattern` on each individual rule)
- `disabled_rule_ids` — completely disable rules that produce noise without
  taint context (Python built-in calls, etc.)

### New: per-rule `exclude_pattern` field
Regex applied to the matched line; if it matches, the finding is suppressed.
Useful for rules that are correct in general but have well-known safe variants:

```toml
[[rule]]
id = "PY107"
ast_match = "Call(func.value.id=yaml, func.attr=load)"
exclude_pattern = "Loader\\s*=\\s*(yaml\\.)?(Safe|Base)Loader"
```

### Rust core changes
- `rules.rs`: adds `Defaults` struct with `exclude_file_patterns`,
  `disabled_rule_ids`; adds `Rule::is_file_excluded()` helper; adds
  `exclude_pattern: Option<Regex>` to `Rule`
- `analysis/mod.rs`: applies `disabled_rule_ids` before scanning
- `analysis/ast_analysis.rs`: uses `Rule::is_file_excluded()` and
  `exclude_pattern`
- `analysis/config_analysis.rs`: same

### Rule fixes applied
| Rule | Fix |
|---|---|
| PY107 / PY302 | `exclude_pattern` suppresses `yaml.load(..., Loader=SafeLoader)` |
| PY515 / SHELL645 / SHELL670 | `exclude_pattern` suppresses `re.compile()` and `.compile()` method calls |
| AUTH711 / ADMIN795 | `exclude_file_pattern` suppresses test fixtures |
| CSRF747 / IMPORT825 / PATH813 / PY001 | `exclude_file_pattern` suppresses test files |
| SESS744 | Pattern narrowed to `session.session_key = request.*` (actual fixation); writing data to a session is not fixation |
| PY511 / JSON612 | Severity downgraded to Low; `json.loads()` alone is not a code execution risk |
| 95 Low/Medium rules | Added to `disabled_rule_ids`; rules flagging every Python built-in call (`isinstance`, `super`, `len`, f-strings, etc.) have no security value without taint analysis |

### Test coverage
`tests/unit/test_false_positive_reductions.py` — 26 tests covering:
- Each suppressed false positive with a concrete code fixture
- True-positive counterpart for each rule (ensure real findings still fire)
- Global `[defaults]` behavior (test-file exclusion via global config, not per-rule)
@ParzivalHack ParzivalHack added the enhancement New feature or request label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants