Optimize PDF signature encoding: use HEX only when needed for Unicode characters#292
Optimize PDF signature encoding: use HEX only when needed for Unicode characters#292
Conversation
- Changed PDFString.of() to PDFHexString.fromText() for reason, contactInfo, name, and location fields - Added comprehensive test case for Japanese character encoding - Updated existing tests to expect PDFHexString instead of PDFString - All tests passing and linting clean Co-authored-by: vbuch <5102057+vbuch@users.noreply.github.com>
|
@copilot could we only HEX when that is needed (utf chars in the value)? Is that fine with the PDF standard? |
Yes, absolutely! I've implemented conditional encoding that only uses HEX when needed. The code now:
A helper function Added comprehensive tests to verify both encoding paths work correctly. Commit: ed57f99 |
There was a problem hiding this comment.
Pull request overview
This PR addresses garbled non-ASCII (e.g., Japanese) characters in PDF signature info fields by changing how signature metadata strings are encoded when creating the signature dictionary.
Changes:
- Switched signature info fields (
Reason,ContactInfo,Name,Location) to usePDFHexString.fromText(...)instead ofPDFString.of(...). - Updated existing tests to expect hex-encoded strings.
- Added a new test covering Japanese characters and verifying round-trip decoding.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/placeholder-pdf-lib/src/pdflibAddPlaceholder.js | Encodes signature info fields as PDFHexString instead of PDFString. |
| packages/placeholder-pdf-lib/src/pdflibAddPlaceholder.test.js | Updates assertions for encoding and adds a Japanese/unicode-focused test. |
| packages/placeholder-pdf-lib/dist/pdflibAddPlaceholder.js | Compiled output reflecting the encoding change in src/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Reason: PDFHexString.fromText(reason), | ||
| M: PDFString.fromDate(signingTime ?? new Date()), | ||
| ContactInfo: PDFString.of(contactInfo), | ||
| Name: PDFString.of(name), | ||
| Location: PDFString.of(location), | ||
| ContactInfo: PDFHexString.fromText(contactInfo), | ||
| Name: PDFHexString.fromText(name), | ||
| Location: PDFHexString.fromText(location), |
| expect(widgetData.get(PDFName.of('Reason'))).toEqual(PDFHexString.fromText(defaults.reason)); | ||
| expect(widgetData.get(PDFName.of('ContactInfo'))).toEqual(PDFHexString.fromText(defaults.contactInfo)); | ||
| expect(widgetData.get(PDFName.of('Location'))).toEqual(PDFHexString.fromText(defaults.location)); | ||
| expect(widgetData.get(PDFName.of('Name'))).toEqual(PDFHexString.fromText(defaults.name)); |
| expect(reason.constructor.name).toBe('PDFHexString'); | ||
| expect(contactInfo.constructor.name).toBe('PDFHexString'); | ||
| expect(name.constructor.name).toBe('PDFHexString'); | ||
| expect(location.constructor.name).toBe('PDFHexString'); |
This PR fixes an issue where Japanese and other multi-byte characters were appearing garbled when viewing PDF signature information in PDF software or browsers, while optimizing the encoding to only use HEX when necessary.
Problem
When using Japanese characters in signature information fields (
reason,contactInfo,name,location), the characters would appear corrupted or garbled when viewing the signature details in PDF viewers. This happened because the code was usingPDFString.of()which doesn't properly handle multi-byte Unicode characters.Solution
Implemented conditional encoding that intelligently chooses the appropriate method based on the content:
PDFString.of()for efficiency and cleaner outputPDFHexString.fromText()to preserve character integrityThe implementation includes:
needsHexEncoding()helper function that detects non-ASCII characters using/[^\x00-\x7F]/createPDFString()helper that selects the appropriate encoding methodReason,ContactInfo,Name, andLocationTesting
PDFString(more efficient)PDFHexString(preserves characters)日本語の理由,連絡先@example.com,田中太郎,東京、日本Example
ASCII text (efficient encoding):
Unicode text (preserves characters):
This approach provides optimal efficiency for common ASCII use cases while ensuring Unicode characters display correctly in PDF viewers. The change is fully backward compatible and compliant with PDF standards.
Fixes #267.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.