ekscrypto
diff --git a/‎CHANGELOG.md‎
Lines changed: 107 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 107 additions & 0 deletions
diff --git a/‎Sources/SwiftEmailValidator/EmailSyntaxValidator.swift‎
Lines changed: 40 additions & 2 deletions b/‎Sources/SwiftEmailValidator/EmailSyntaxValidator.swift‎
Lines changed: 40 additions & 2 deletions
diff --git a/‎Sources/SwiftEmailValidator/IPAddressSyntaxValidator.swift‎
Lines changed: 7 additions & 4 deletions b/‎Sources/SwiftEmailValidator/IPAddressSyntaxValidator.swift‎
Lines changed: 7 additions & 4 deletions
@@ -0,0 +1,107 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+
+#### New Unit Tests (48 tests across 3 files)
+
+**EmailSyntaxValidatorTests.swift**
+- `testLocalPartExactly63Characters` - Boundary test for 63-character local part
+- `testLocalPartExactlyOneCharacter` - Minimum valid local part
+- `testLocalPartEmptyString` - Empty local part rejection
+- `testUnicodeLocalPartCharacterVsByteCount` - 30 four-byte Unicode chars (120 bytes, 30 chars)
+- `testUnicodeLocalPartExceeds64Characters` - 65+ Unicode character rejection
+- `testEmojiInLocalPart` - Emoji validation in Unicode mode
+- `testCombiningMarksInLocalPart` - Diacritics and combining characters
+- `testHighUnicodeRanges` - Characters beyond BMP (U+1D400+)
+- `testZeroWidthCharacters` - ZWSP, ZWJ, ZWNJ handling
+- `testBidirectionalOverrideCharacters` - RTL/LTR control character rejection
+- `testC1ControlCharactersRejected` - C1 control character rejection (U+0080-U+009F)
+- `testRFC2047EncodedWithIPv4AddressLiteral` - RFC2047 with IPv4 literal
+- `testRFC2047EncodedWithIPv6AddressLiteral` - RFC2047 with IPv6 literal
+- `testQuotedStringWithMultipleAtSymbols` - Multiple @ in quoted strings
+- `testQuotedStringWithRFC2047Decoding` - RFC2047 decoded quoted strings
+- `testAutoEncodeToRfc2047WithAddressLiteral` - Combined options testing
+- `testCustomDomainValidatorAcceptsAnyDomain` - Permissive validator
+- `testCustomDomainValidatorRejectsAllDomains` - Restrictive validator
+- `testCustomDomainValidatorWithSpecificTLDs` - TLD-specific validation
+- `testCustomDomainValidatorReceivesCorrectDomain` - Domain parameter verification
+- `testCustomDomainValidatorWithUnicodeDomain` - IDN domain handling
+- `testMultipleDotsInVariousPositions` - Valid multi-dot local parts
+- `testSingleCharactersBetweenDots` - Minimal segments between dots
+- `testMaxConsecutiveSpecialCharacters` - Consecutive special characters
+- `testSpecialCharactersAtBoundaries` - Special chars at start/end of segments
+- `testExtremelyLongLocalPart` - 1000 character local part rejection
+- `testExtremelyLongDomain` - 500+ character domain handling
+- `testVeryLongRFC2047EncodedString` - Near 76-char limit RFC2047
+- `testManyUnicodeCharactersInLocalPart` - 64 diverse Unicode characters
+
+**RFC2047CoderTests.swift**
+- `testDecodingUTF16B` - Base64 with UTF-16 charset
+- `testDecodingUTF32B` - Base64 with UTF-32 charset
+- `testDecodingUTF16InvalidData` - Malformed UTF-16 rejection
+- `testDecodingUTF32InvalidData` - Malformed UTF-32 rejection
+- `testEncodeDecodeRoundTripSimpleASCII` - ASCII round-trip
+- `testEncodeDecodeRoundTripUnicode` - Unicode round-trip
+- `testEncodeDecodeRoundTripSpecialCharacters` - Special character round-trip
+- `testDecodingLatin2QPolishCharacters` - Polish special characters
+- `testDecodingLatin2QCzechCharacters` - Czech special characters
+- `testDecodingLatin2InvalidControlCharacter` - Invalid byte handling
+- `testEncodeEmptyString` - Empty string encoding
+- `testDecodeWithMixedCaseCharset` - Case-insensitive charset
+- `testDecodeWithMixedCaseEncoding` - Case-insensitive encoding type
+- `testDecodeWithWhitespaceInEncodedWord` - Whitespace handling
+
+**IPAddressValidatorTests.swift**
+- `testIPv6ZoneIdentifiers` - Zone identifier rejection per RFC 5321
+- `testIPv6LoopbackVariants` - `::1` variations
+- `testIPv4MappedIPv6Extended` - `::ffff:` mapped addresses
+- `testIPv4LeadingZeros` - Leading zeros handling
+- `testEmptyIPAddressStrings` - Empty/whitespace rejection
+
+### Changed
+
+- **EmailSyntaxValidator.swift**: Reordered CharacterSet construction to work around Foundation bug where `.subtracting()` corrupts supplementary Unicode plane data. Supplementary planes (U+10000-U+10FFFF) are now added last, after all subtractions.
+
+### Fixed
+
+#### RFC 5321 Compliance
+- **IPAddressSyntaxValidator.swift**: IPv6 zone identifiers (e.g., `fe80::1%eth0`) are now correctly rejected. Per RFC 5321 Section 4.1.3, zone identifiers are not valid in email address literals.
+
+#### RFC 5198 Compliance
+- **EmailSyntaxValidator.swift**: C1 control characters (U+0080-U+009F) are now rejected in Unicode mode. Per RFC 5198 Section 2, these control characters should be avoided in network interchange.
+
+#### RFC 6531 Compliance
+- **EmailSyntaxValidator.swift**: Fixed supplementary Unicode plane support (U+10000-U+10FFFF). Emoji, mathematical symbols, and other characters beyond the Basic Multilingual Plane now correctly validate in Unicode mode.
+
+#### Security Improvements
+- **EmailSyntaxValidator.swift**: Bidirectional formatting characters are now rejected:
+  - Left-to-Right Mark / Right-to-Left Mark (U+200E-U+200F)
+  - Directional embeddings and overrides (U+202A-U+202E)
+  - Directional isolates (U+2066-U+2069)
+  - Deprecated format characters (U+206A-U+206F)
+
+  These characters can be exploited for homograph attacks and email spoofing.
+
+### Technical Notes
+
+#### CharacterSet Bug Workaround
+Foundation's `CharacterSet` has a bug where calling `.subtracting()` on a set that includes supplementary Unicode planes (U+10000+) corrupts the supplementary plane data, even when the subtracted characters don't overlap. The workaround is to add supplementary planes as the final `.union()` call, after all `.subtracting()` operations are complete.
+
+```swift
+// WRONG - supplementary planes get corrupted by subsequent subtractions
+let charset = baseSet
+    .union(supplementaryPlanes)  // Added here...
+    .subtracting(c1Controls)     // ...corrupted here
+
+// CORRECT - add supplementary planes last
+let charset = baseSet
+    .subtracting(c1Controls)     // All subtractions first
+    .union(supplementaryPlanes)  // Add supplementary planes last
+```
@@ -7,10 +7,12 @@
 //
 //  References:
 //  * RFC2047 https://datatracker.ietf.org/doc/html/rfc2047
+//  * RFC5198 https://datatracker.ietf.org/doc/html/rfc5198 (Unicode Format for Network Interchange)
 //  * RFC5321 https://datatracker.ietf.org/doc/html/rfc5321 Section 4.1.2 & Section 4.1.3
 //  * RFC5322 https://datatracker.ietf.org/doc/html/rfc5322 Section 3.2.3 & Section 3.4.1
 //  * RFC5234 https://datatracker.ietf.org/doc/html/rfc5234 Appendix B.1
 //  * RFC6531 https://datatracker.ietf.org/doc/html/rfc6531
+//  * RFC6532 https://datatracker.ietf.org/doc/html/rfc6532
 
 import Foundation
 import SwiftPublicSuffixList
@@ -194,17 +196,53 @@ public final class EmailSyntaxValidator {
         .union(CharacterSet(charactersIn: digitRange))
         .union(CharacterSet(charactersIn: #"!#$%&'*+-/=?^_`{|}~"#)) // Ref RFC5322 section 3.2.3 Atom, definition of atext
     private static let asciiRange: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x00)!...Unicode.Scalar(0x7F)!
+
+    // RFC6531 extends atext to include UTF8-non-ascii (U+0080+)
+    // RFC5198 Section 2: Control characters (U+0000-U+001F, U+007F-U+009F) should be avoided
+    // We also exclude other problematic characters per security best practices:
+    // - Bidirectional formatting characters (U+200E-U+200F, U+202A-U+202E, U+2066-U+2069)
+    // - Deprecated format characters (U+206A-U+206F)
+    private static let c1ControlRange: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x80)!...Unicode.Scalar(0x9F)! // C1 control chars
+    private static let bidiFormattingChars: CharacterSet = CharacterSet(charactersIn: Unicode.Scalar(0x200E)!...Unicode.Scalar(0x200F)!) // LRM, RLM
+        .union(CharacterSet(charactersIn: Unicode.Scalar(0x202A)!...Unicode.Scalar(0x202E)!)) // LRE, RLE, PDF, LRO, RLO
+        .union(CharacterSet(charactersIn: Unicode.Scalar(0x2066)!...Unicode.Scalar(0x2069)!)) // LRI, RLI, FSI, PDI
+    private static let deprecatedFormatChars: CharacterSet = CharacterSet(charactersIn: Unicode.Scalar(0x206A)!...Unicode.Scalar(0x206F)!) // Deprecated formatting
+
+    // Note: CharacterSet.inverted doesn't properly include supplementary planes (U+10000+)
+    // We must explicitly include them. Unicode planes:
+    // - BMP (U+0000-U+FFFF) - included via asciiRange.inverted
+    // - SMP (U+10000-U+1FFFF) - Supplementary Multilingual Plane (emoji, historic scripts)
+    // - SIP (U+20000-U+2FFFF) - Supplementary Ideographic Plane (CJK)
+    // - TIP (U+30000-U+3FFFF) - Tertiary Ideographic Plane
+    // - Planes 4-13 (U+40000-U+DFFFF) - Unassigned
+    // - SSP (U+E0000-U+EFFFF) - Supplementary Special-purpose Plane
+    // - PUA (U+F0000-U+10FFFF) - Private Use Areas
+    private static let supplementaryPlanes: CharacterSet = CharacterSet(charactersIn: Unicode.Scalar(0x10000)!...Unicode.Scalar(0x10FFFF)!)
+
+    // Note: CharacterSet has a bug where .subtracting() corrupts supplementary plane data
+    // We must add supplementaryPlanes LAST, after all subtractions are complete
     private static let atextUnicodeCharacterSet: CharacterSet = atextCharacterSet
-        .union(CharacterSet(charactersIn: asciiRange).inverted)
+        .union(CharacterSet(charactersIn: asciiRange).inverted) // BMP non-ASCII
+        .subtracting(CharacterSet(charactersIn: c1ControlRange)) // Exclude C1 control characters per RFC5198
+        .subtracting(bidiFormattingChars) // Exclude bidirectional formatting (security)
+        .subtracting(deprecatedFormatChars) // Exclude deprecated format characters
+        .union(supplementaryPlanes) // Supplementary planes (emoji, etc.) - MUST BE LAST (after subtractions)
+
     private static let quotedPairSMTP: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x20)!...Unicode.Scalar(0x7E)!
     private static let qtextSMTP1: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x20)!...Unicode.Scalar(0x21)!
     private static let qtextSMTP2: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x23)!...Unicode.Scalar(0x5B)!
     private static let qtextSMTP3: ClosedRange<Unicode.Scalar> = Unicode.Scalar(0x5D)!...Unicode.Scalar(0x7E)!
     private static let qtextSMTPCharacterSet: CharacterSet = CharacterSet(charactersIn: qtextSMTP1)
         .union(CharacterSet(charactersIn: qtextSMTP2))
         .union(CharacterSet(charactersIn: qtextSMTP3))
+    // Note: CharacterSet has a bug where .subtracting() corrupts supplementary plane data
+    // We must add supplementaryPlanes LAST, after all subtractions are complete
     private static let qtextUnicodeSMTPCharacterSet = qtextSMTPCharacterSet
-        .union(CharacterSet(charactersIn: asciiRange).inverted)
+        .union(CharacterSet(charactersIn: asciiRange).inverted) // BMP non-ASCII
+        .subtracting(CharacterSet(charactersIn: c1ControlRange)) // Exclude C1 control characters per RFC5198
+        .subtracting(bidiFormattingChars) // Exclude bidirectional formatting (security)
+        .subtracting(deprecatedFormatChars) // Exclude deprecated format characters
+        .union(supplementaryPlanes) // Supplementary planes (emoji, etc.) - MUST BE LAST (after subtractions)
 
     private static func extractDotAtom(_ candidate: String, compatibility: Compatibility) -> String? {
         guard !candidate.hasPrefix("\""),
 
@@ -25,12 +25,15 @@ final public class IPAddressSyntaxValidator {
         return candidate.range(of: v4regex, options: .regularExpression) != nil
     }
 
-    /// Validates that the candidate string respects the IPv6 syntax
+    /// Validates that the candidate string respects the IPv6 syntax per RFC 5321
     /// - Parameter candidate: String to validate
-    /// - Returns: true if syntax eems valid, false otherwise
+    /// - Returns: true if syntax seems valid, false otherwise
+    /// - Note: Zone identifiers (e.g., %eth0) are NOT allowed per RFC 5321 for email addresses
     static func matchIPv6(_ candidate: String) -> Bool {
-        // Source: https://gist.github.com/syzdek/6086792
-        let v6regex = #"^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$"#
+        // Based on: https://gist.github.com/syzdek/6086792
+        // Modified: Removed zone identifier pattern (fe80:...%...) as zone IDs are not valid
+        // in email address literals per RFC 5321 Section 4.1.3
+        let v6regex = #"^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$"#
         return candidate.range(of: v6regex, options: .regularExpression) != nil
     }
 }
Original file line number	Diff line number	Diff line change
`@@ -25,12 +25,15 @@ final public class IPAddressSyntaxValidator {`
`25`	`25`	`return candidate.range(of: v4regex, options: .regularExpression) != nil`
`26`	`26`	`}`
`27`	`27`
`28`		`- /// Validates that the candidate string respects the IPv6 syntax`
	`28`	`+ /// Validates that the candidate string respects the IPv6 syntax per RFC 5321`
`29`	`29`	`/// - Parameter candidate: String to validate`
`30`		`- /// - Returns: true if syntax eems valid, false otherwise`
	`30`	`+ /// - Returns: true if syntax seems valid, false otherwise`
	`31`	`+ /// - Note: Zone identifiers (e.g., %eth0) are NOT allowed per RFC 5321 for email addresses`
`31`	`32`	`static func matchIPv6(_ candidate: String) -> Bool {`
`32`		`- // Source: https://gist.github.com/syzdek/6086792`
`33`		- let v6regex = #"^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}\|([0-9a-fA-F]{1,4}:){1,7}:\|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}\|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}\|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}\|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}\|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}\|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})\|:((:[0-9a-fA-F]{1,4}){1,7}\|:)\|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}\|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9]))$"#
	`33`	`+ // Based on: https://gist.github.com/syzdek/6086792`
	`34`	`+ // Modified: Removed zone identifier pattern (fe80:...%...) as zone IDs are not valid`
	`35`	`+ // in email address literals per RFC 5321 Section 4.1.3`
	`36`	+ let v6regex = #"^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}\|([0-9a-fA-F]{1,4}:){1,7}:\|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}\|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}\|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}\|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}\|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}\|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})\|:((:[0-9a-fA-F]{1,4}){1,7}\|:)\|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9]))$"#
`34`	`37`	`return candidate.range(of: v6regex, options: .regularExpression) != nil`
`35`	`38`	`}`
`36`	`39`	`}`