Description
veraPDF incorrectly reports a PDF/A-3u (and PDF/A-2u) violation of clause 6.2.11.8 (.notdef glyph reference) for documents containing symbolic TrueType fonts with no explicit /Encoding entry, where character code 0x00 is legitimately mapped to a real glyph (e.g. space, GID 61) via the font's built-in cmap.
The document is valid: the font's internal encoding maps char code 0 to a real glyph, not .notdef. veraPDF reports a false positive.
Steps to reproduce
- Take a PDF/A-3u document with an embedded symbolic TrueType font (Flags bit 3 set, bit 6 clear) that has no
/Encoding entry in the font dictionary.
- The font's cmap maps character code
0x00 to a real glyph (e.g. space, GID 61 - not GID 0).
- The content stream contains a text-showing operator (e.g.
TJ) that uses character code 0x00.
- Run veraPDF validation against PDF/A-3u (or PDF/A-2u).
Result: veraPDF reports violation of clause 6.2.11.8:
"The document contains a reference to the .notdef glyph"
Expected: No violation. Char code 0 maps to a real glyph via the font's built-in encoding; no .notdef reference exists.
Test case
Attached file: 2039171-page01-text-only.pdfa-3u.pdf
The document contains three symbolic TrueType fonts, all without an explicit /Encoding entry:
| Font resource |
BaseFont |
Flags |
/F0 |
IAEBHR+TimesNewRomanPS-ItalicMT |
6 (Symbolic) |
/F1 |
WGHOTS+TimesNewRomanPS-BoldItalicMT |
262150 (Symbolic) |
/F2 |
YLQPBY+TimesNewRomanPSMT |
6 (Symbolic) |
veraPDF v.1.28.2, veraPDF v.1.31.5 output:
2039171-page01-text-only.pdfa... ... NOT COMPLIANT
- Clause 6.2.11.8 (ISO 19005-3:2012) : Glyph
A PDF/A-3 compliant document shall not contain a reference to the .notdef glyph from any of the text showing operators, regardless of text rendering mode, in any content stream
test: name != ".notdef"
error: The document contains a reference to the .notdef glyph
context: root/document[0]/pages[0](7 0 obj PDPage)/contentStream[0](15 0 obj PDContentStream)/operators[6]/usedGlyphs[55](IAEBHR+TimesNewRomanPS-ItalicMT IAEBHR+TimesNewRomanPS-ItalicMT 0 0 0 false)
context: root/document[0]/pages[0](7 0 obj PDPage)/contentStream[0](15 0 obj PDContentStream)/operators[362]/usedGlyphs[0](WGHOTS+TimesNewRomanPS-BoldItalicMT WGHOTS+TimesNewRomanPS-BoldItalicMT 0 0 0 false)
context: root/document[0]/pages[0](7 0 obj PDPage)/contentStream[0](15 0 obj PDContentStream)/operators[506]/usedGlyphs[0](YLQPBY+TimesNewRomanPSMT YLQPBY+TimesNewRomanPSMT 0 0 0 false)
In every flagged context the character code is 0x00.
Analysis of the font program (via pikepdf + fontTools) confirms that char code 0x00 maps to glyph name space, GID 61 - not GID 0 (.notdef):
/F0
operators[6]:
[ (\033) 10 (\() -87 (\000) 82 (6) 10 (7) 47 (,) 10 (;) 10 (,) 10 (4) 10 (>) 10 (0) 10 (5) 10 (4) 10 (,) -86 (\000) 82 (@) -86 (\000) 82 (0) 10 (4) -87 (\000) 82 (6) 10 (7) 10 (0) 10 (3) 10 (5) -87 (\000) 82 (2) 10 (:) 10 (5) 10 (.) 10 (5) -87 (\000) 82 (*) 10 (:) 10 (2) 10 (9) 10 (:) 10 (7) 10 (\() -87 (\000) 82 (+) 10 (,) 10 (2) 10 (2) 10 (\() -87 (\000) 82 (8) 10 (0) 10 (*) 10 (:) 10 (7) 47 (,) 10 (>) 10 (>) 10 (\() -87 (\000) ] TJ
operators[6] charcodes stats:
+------+-----+-----+------------+-----+---------+
| code | dec | hex | glyph name | GID | Unicode |
+------+-----+-----+------------+-----+---------+
| \000 | 00 | 00 | space | 61 | | <- NOT .notdef
| \033 | 27 | 1b | L | 42 | L |
| \( | 40 | 28 | a | 57 | a |
...
+------+-----+-----+------------+-----+---------+
/F1
operators[362]:
(\000) Tj
operators[362] charcodes stats:
+------+-----+-----+------------+-----+---------+
| code | dec | hex | glyph name | GID | Unicode |
+------+-----+-----+------------+-----+---------+
| \000 | 00 | 00 | space | 47 | | <- NOT .notdef
+------+-----+-----+------------+-----+---------+
/F2
operators[506]:
(\000) Tj
operators[506] charcodes stats:
+------+-----+-----+------------+-----+---------+
| code | dec | hex | glyph name | GID | Unicode |
+------+-----+-----+------------+-----+---------+
| \000 | 00 | 00 | space | 67 | | <- NOT .notdef
+------+-----+-----+------------+-----+---------+
Root cause
File: validation-model/src/main/java/org/verapdf/gf/model/impl/operator/textshow/GFGlyph.java
Lines: 91–95
if (font instanceof PDSimpleFont) {
Encoding encoding = font.getEncodingMapping();
this.name = encoding == null ? null : encoding.getName(glyphCode);
if (this.name == null && glyphCode == 0 && font instanceof PDTrueTypeFont) {
this.name = ".notdef"; // ¯\_(ツ)_/¯
}
}
Chain of events for a symbolic TrueType font with no /Encoding
Step 1. font.getEncodingMapping() calls PDFont.getEncodingMappingFromCOSObject().
Since there is no /Encoding key in the font dictionary, cosEncoding.getDirectBase() is null, so it returns Encoding.empty().
Step 2. Encoding.empty().getName(0) is called.
Encoding.empty() has predefinedEncoding = new String[0] and differences = null.
Inside getName():
// Encoding.java:105
return (predefinedEncoding.length != 0) ? NOTDEF : null;
// predefinedEncoding.length == 0 -> returns null
The comment on this very line reads:
"if no predefined encoding, the null result for using font encoding"
-> null is the intended signal to fall back to the font program's own encoding.
Step 3. Back in GFGlyph: this.name == null + glyphCode == 0 + font instanceof PDTrueTypeFont
-> hardcodes .notdef, completely ignoring what the font program would say.
Why the hardcode is wrong for symbolic TrueType fonts
ISO 32000-1:2008 9.6.6.4:
"When the font has no Encoding entry, or the font descriptor's Symbolic flag is set (in which case the Encoding entry is ignored), this shall occur:
- If the font contains a (3, 0) subtable, the range of character codes shall be one of these: 0x0000 – 0x00FF, 0xF000 – 0xF0FF, 0xF100 – 0xF1FF, or 0xF200 – 0xF2FF. Depending on the range of codes, each byte from the string shall be prepended with the high byte of the range, to form a two-byte character, which shall be used to select the associated glyph description from the subtable.
- Otherwise, if the font contains a (1, 0) subtable, single bytes from the string shall be used to look up the associated glyph descriptions from the subtable."
Per the spec, char code 0 is a valid single byte that shall be looked up in the font's cmap. If the cmap maps it to a real glyph (as it does here - space, GID 61), there is no .notdef reference. Assigning .notdef to char code 0 without consulting the font program contradicts the spec.
Note also the phrase "in which case the Encoding entry is ignored": for a symbolic font, the /Encoding entry is irrelevant regardless of whether it is present or absent. veraPDF does the opposite - it derives the glyph name from Encoding.getName() and, when that returns null for code 0, falls back to hardcoding .notdef instead of consulting the font's cmap.
Note that initForNotType3() already has an analogous workaround for the glyphPresent field:
// GFGlyph.java:181-183
// every font contains notdef glyph. But if we call method
// of font program we can't distinguish case of code 0
// and glyph that is not present indeed.
glyphPresent = glyphCode == 0 || font.glyphIsPresent(glyphCode);
This workaround correctly prevents a false "glyph not present" error for code 0. However, it does not affect the name field, which is what clause 6.2.11.8 actually checks.
Suggested fix
File: validation-model/src/main/java/org/verapdf/gf/model/impl/operator/textshow/GFGlyph.java
Replace the entire if (font instanceof PDSimpleFont) block with:
if (font instanceof PDSimpleFont) {
Encoding encoding = (font instanceof PDTrueTypeFont && ((PDTrueTypeFont) font).isSymbolic())
? Encoding.empty() // ISO 32000, 9.6.6.4: Symbolic flag -> Encoding entry is ignored
: font.getEncodingMapping();
this.name = encoding == null ? null : encoding.getName(glyphCode);
if (this.name == null && font instanceof PDTrueTypeFont) {
// ISO 32000, 9.6.6.4: no Encoding or Symbolic -> consult font program (cmap)
FontProgram fp = font.getFontProgram();
if (fp != null) {
String programName = fp.getGlyphName(glyphCode);
this.name = (programName != null) ? programName
: (fp.containsCode(glyphCode) ? null : ".notdef");
} else if (glyphCode == 0) {
// conservative fallback: font program unavailable, assume .notdef for code 0
this.name = ".notdef";
}
}
}
Key changes:
- For symbolic TrueType fonts, force
Encoding.empty() so the /Encoding entry is ignored per spec. Encoding.empty().getName() returns null for all codes, which is then resolved by the font program.
- When
getName() returns null for a TrueType font, consult the font program: getGlyphName() returns " " (non-.notdef sentinel) for symbolic fonts, or the actual glyph name for non-symbolic fonts. If getGlyphName() returns null, fall back to containsCode() (cmap lookup). This is intentionally broader than just code 0: it correctly handles any code for which the PDF-level encoding returns null but the font's cmap resolves to a real glyph.
- Conservative fallback: if the font program is unavailable entirely (
fp == null), assume .notdef for code 0.
Note on the suggested fix
I am not a veraPDF developer and my reading of the internals may be incomplete.
If the root cause analysis above is wrong, I hope the attached test case is sufficient to reproduce the issue and help you locate the real problem.
Either way, happy to provide any additional information.
Thank you for building and maintaining veraPDF.
Description
veraPDF incorrectly reports a PDF/A-3u (and PDF/A-2u) violation of clause 6.2.11.8 (
.notdefglyph reference) for documents containing symbolic TrueType fonts with no explicit/Encodingentry, where character code0x00is legitimately mapped to a real glyph (e.g.space, GID 61) via the font's built-in cmap.The document is valid: the font's internal encoding maps char code 0 to a real glyph, not
.notdef. veraPDF reports a false positive.Steps to reproduce
/Encodingentry in the font dictionary.0x00to a real glyph (e.g.space, GID 61 - not GID 0).TJ) that uses character code0x00.Result: veraPDF reports violation of clause 6.2.11.8:
Expected: No violation. Char code 0 maps to a real glyph via the font's built-in encoding; no
.notdefreference exists.Test case
Attached file: 2039171-page01-text-only.pdfa-3u.pdf
The document contains three symbolic TrueType fonts, all without an explicit
/Encodingentry:/F0IAEBHR+TimesNewRomanPS-ItalicMT/F1WGHOTS+TimesNewRomanPS-BoldItalicMT/F2YLQPBY+TimesNewRomanPSMTveraPDF v.1.28.2, veraPDF v.1.31.5 output:
In every flagged context the character code is
0x00.Analysis of the font program (via pikepdf + fontTools) confirms that char code
0x00maps to glyph namespace, GID 61 - not GID 0 (.notdef):/F0
/F1
/F2
Root cause
File:
validation-model/src/main/java/org/verapdf/gf/model/impl/operator/textshow/GFGlyph.javaLines: 91–95
Chain of events for a symbolic TrueType font with no
/EncodingStep 1.
font.getEncodingMapping()callsPDFont.getEncodingMappingFromCOSObject().Since there is no
/Encodingkey in the font dictionary,cosEncoding.getDirectBase()isnull, so it returnsEncoding.empty().Step 2.
Encoding.empty().getName(0)is called.Encoding.empty()haspredefinedEncoding = new String[0]anddifferences = null.Inside
getName():The comment on this very line reads:
"if no predefined encoding, the null result for using font encoding"
->
nullis the intended signal to fall back to the font program's own encoding.Step 3. Back in
GFGlyph:this.name == null+glyphCode == 0+font instanceof PDTrueTypeFont-> hardcodes
.notdef, completely ignoring what the font program would say.Why the hardcode is wrong for symbolic TrueType fonts
ISO 32000-1:2008 9.6.6.4:
Per the spec, char code 0 is a valid single byte that shall be looked up in the font's cmap. If the cmap maps it to a real glyph (as it does here -
space, GID 61), there is no.notdefreference. Assigning.notdefto char code 0 without consulting the font program contradicts the spec.Note also the phrase "in which case the Encoding entry is ignored": for a symbolic font, the
/Encodingentry is irrelevant regardless of whether it is present or absent. veraPDF does the opposite - it derives the glyph name fromEncoding.getName()and, when that returnsnullfor code 0, falls back to hardcoding.notdefinstead of consulting the font's cmap.Note that
initForNotType3()already has an analogous workaround for theglyphPresentfield:This workaround correctly prevents a false "glyph not present" error for code 0. However, it does not affect the
namefield, which is what clause 6.2.11.8 actually checks.Suggested fix
File:
validation-model/src/main/java/org/verapdf/gf/model/impl/operator/textshow/GFGlyph.javaReplace the entire
if (font instanceof PDSimpleFont)block with:Key changes:
Encoding.empty()so the/Encodingentry is ignored per spec.Encoding.empty().getName()returnsnullfor all codes, which is then resolved by the font program.getName()returnsnullfor a TrueType font, consult the font program:getGlyphName()returns" "(non-.notdefsentinel) for symbolic fonts, or the actual glyph name for non-symbolic fonts. IfgetGlyphName()returnsnull, fall back tocontainsCode()(cmap lookup). This is intentionally broader than just code 0: it correctly handles any code for which the PDF-level encoding returnsnullbut the font's cmap resolves to a real glyph.fp == null), assume.notdeffor code 0.Note on the suggested fix
I am not a veraPDF developer and my reading of the internals may be incomplete.
If the root cause analysis above is wrong, I hope the attached test case is sufficient to reproduce the issue and help you locate the real problem.
Either way, happy to provide any additional information.
Thank you for building and maintaining veraPDF.