Skip to content

Refactor received header parsing: replace regex list with RFC 5321 to…#150

Merged
fedelemantuano merged 3 commits intodevelopfrom
claude/improve-header-regex-vCFLu
Mar 29, 2026
Merged

Refactor received header parsing: replace regex list with RFC 5321 to…#150
fedelemantuano merged 3 commits intodevelopfrom
claude/improve-header-regex-vCFLu

Conversation

@fedelemantuano
Copy link
Copy Markdown
Contributor

…kenizer

Replace the 10 separate regex patterns (each duplicating boundary lookaheads) with a keyword-based tokenizer aligned with RFC 5321 §4.4 grammar. Key improvements:

  • Tokenize on clause keywords (from/by/via/with/id/for) in a single pass instead of running 10 independent regex searches
  • Handle IBM "for from " by accepting only the first 'from' clause per header
  • Extract envelope-from/sender from parenthesized comments in clause values
  • Validate IPv4 octets (0-255) instead of matching any digits; add IPv6 support via REGXIP6
  • Simplify JUNK_PATTERN to only collapse tabs/newlines, preserving parenthesized comments and bracketed IPs
  • Add 27-test corpus covering Postfix, Exim, Exchange, Gmail, SendGrid, IBM/Domino, AWS SES, and edge cases

https://claude.ai/code/session_01CwmwWkvZGLpTBY6ApKFi79

…kenizer

Replace the 10 separate regex patterns (each duplicating boundary
lookaheads) with a keyword-based tokenizer aligned with RFC 5321 §4.4
grammar. Key improvements:

- Tokenize on clause keywords (from/by/via/with/id/for) in a single
  pass instead of running 10 independent regex searches
- Handle IBM "for <addr> from <sender>" by accepting only the first
  'from' clause per header
- Extract envelope-from/sender from parenthesized comments in clause
  values
- Validate IPv4 octets (0-255) instead of matching any digits; add
  IPv6 support via REGXIP6
- Simplify JUNK_PATTERN to only collapse tabs/newlines, preserving
  parenthesized comments and bracketed IPs
- Add 27-test corpus covering Postfix, Exim, Exchange, Gmail,
  SendGrid, IBM/Domino, AWS SES, and edge cases

https://claude.ai/code/session_01CwmwWkvZGLpTBY6ApKFi79
@fedelemantuano fedelemantuano self-assigned this Mar 18, 2026
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 99.796% (+0.6%) from 99.232%
when pulling 3ecf670 on claude/improve-header-regex-vCFLu
into 1c700e4 on develop.

@fedelemantuano fedelemantuano merged commit 5f96ce0 into develop Mar 29, 2026
8 checks passed
@fedelemantuano fedelemantuano deleted the claude/improve-header-regex-vCFLu branch March 29, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants