Skip to content

Fixed issue #132 and #133#151

Merged
fedelemantuano merged 2 commits intodevelopfrom
issue_133
Apr 4, 2026
Merged

Fixed issue #132 and #133#151
fedelemantuano merged 2 commits intodevelopfrom
issue_133

Conversation

@fedelemantuano
Copy link
Copy Markdown
Contributor

Summary

  • Address headers where the display name is an email address (e.g. From: alice@example.com <bob@example.com>) were silently dropped, returning [["", ""]] instead of the real values
  • Absent address headers (bcc, cc, reply-to, delivered-to) appeared in the output as [["", ""]] instead of being omitted

Root cause

Both bugs trace back to Python's email.utils.getaddresses, which returns [('', '')] — a non-empty list with one empty tuple — for input it cannot parse. This includes both absent headers (empty string input) and headers with RFC-non-compliant display names.

The CVE-2023-27043 security hardening (backported to Python 3.9+) made strict=True the default. This correctly rejects unquoted @ in display names per RFC 5322 §3.4. The right call for an MTA — but mail-parser is a security/forensics tool: hiding an address because its display name looks like an email address (a common pattern in phishing and impersonation mail) defeats the purpose of the tool.

Changes

Bug 1 — Email address as display name is silently dropped (fixes #132)

New get_addresses() helper in utils.py:

  1. Tries strict=True first — CVE hardening preserved for well-formed input
  2. If every result is ('', '') on a non-empty header, falls back to a regex that extracts angle-bracket addresses and display names directly from the raw header value
  3. The function is fully documented with the RFC/CVE rationale

Before:

From: alice@example.com <bob@example.com>
"from": [["", ""]]

After:

"from": [["alice@example.com", "bob@example.com"]]

Bug 2 — Absent address headers appear in output as [["", ""]]

One-line guard — if email_addr — added to the list comprehension in __getattr__. Any tuple with an empty address is filtered out, so absent headers produce [], which is falsy and excluded from the output.

Before:

"bcc":      [["", ""]],
"cc":       [["", ""]],
"reply-to": [["", ""]]

After: keys are absent from the output entirely.

Test plan

  • tests/mails/mail_test_19 — new synthetic email with email-as-display-name in From, CC, Reply-To, and mixed To
  • TestEmailAsDisplayName — 14 tests covering both bugs and edge cases:
    • alice@example.com <bob@example.com>[("alice@example.com", "bob@example.com")]
    • bob@example.com <bob@example.com>[("", "bob@example.com")] (name == addr, suppressed)
    • "alice@example.com" <bob@example.com> → strict path, no change
    • Alice Smith <alice@example.com> → strict path, no change
    • alice@example.com (bare) → no change
    • Header absent → [], key omitted from output
    • Multiple addresses all using email-as-name → all recovered
  • 187 passed | coverage 99% | pre-commit all hooks passed

@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 99.662% (-0.1%) from 99.797%
when pulling 6c2a5e0 on issue_133
into f6df398 on develop.

@fedelemantuano fedelemantuano merged commit e18e7c9 into develop Apr 4, 2026
8 checks passed
@fedelemantuano fedelemantuano deleted the issue_133 branch April 4, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

An email address will not be parsed if the Real Name is also an email address

2 participants