|
| 1 | +# JAXB vs Jackson Analysis for DKPro Core Migration |
| 2 | + |
| 3 | +## Current Situation |
| 4 | + |
| 5 | +We have migrated both XCES and BioC modules from `javax.xml.bind` (JAXB) to Jackson XML. The user is now questioning whether we should instead upgrade to **Jakarta JAXB** rather than cross-grading to Jackson. |
| 6 | + |
| 7 | +## Original JAXB Approach |
| 8 | + |
| 9 | +### XCES Module (Writing) |
| 10 | +**Pattern**: Hybrid XMLEventWriter + JAXB Marshaller |
| 11 | +```java |
| 12 | +JAXBContext context = JAXBContext.newInstance(XcesBody.class); |
| 13 | +Marshaller marshaller = context.createMarshaller(); |
| 14 | +marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE); |
| 15 | + |
| 16 | +// Manually write structure with XMLEventWriter |
| 17 | +xmlEventWriter.add(xmlef.createStartElement("", "", "cesDoc")); |
| 18 | +xmlEventWriter.add(xmlef.createStartElement("", "", "cesHeader")); |
| 19 | +xmlEventWriter.add(xmlef.createEndElement("", "", "cesHeader")); |
| 20 | +xmlEventWriter.add(xmlef.createStartElement("", "", "text")); |
| 21 | + |
| 22 | +// Marshal body content directly to the event writer |
| 23 | +marshaller.marshal(new JAXBElement<XcesBody>(new QName("body"), XcesBody.class, xb), |
| 24 | + xmlEventWriter); |
| 25 | +``` |
| 26 | + |
| 27 | +**Key Advantage**: JAXB Marshaller can write **directly to XMLEventWriter** - NO string round-trip needed! |
| 28 | + |
| 29 | +### BioC Module (Reading) |
| 30 | +**Pattern**: XMLEventReader + JAXB Unmarshaller |
| 31 | +```java |
| 32 | +JAXBContext context = JAXBContext.newInstance(BioCDocument.class); |
| 33 | +Unmarshaller unmarshaller = context.createUnmarshaller(); |
| 34 | + |
| 35 | +// Unmarshal directly from XMLEventReader |
| 36 | +var document = unmarshaller.unmarshal(getXmlEventReader(), BioCDocument.class).getValue(); |
| 37 | +``` |
| 38 | + |
| 39 | +**Key Advantage**: JAXB Unmarshaller works **directly with XMLEventReader** - clean streaming! |
| 40 | + |
| 41 | +## Current Jackson Approach |
| 42 | + |
| 43 | +### XCES Module (Writing) |
| 44 | +**Pattern**: XMLEventWriter + Jackson → String → XMLEventReader → XMLEventWriter |
| 45 | +```java |
| 46 | +XmlMapper xmlMapper = new XmlMapper(); |
| 47 | + |
| 48 | +// Jackson can ONLY serialize to String or OutputStream |
| 49 | +String bodyXml = xmlMapper.writer().withRootName("body").writeValueAsString(xb); |
| 50 | + |
| 51 | +// Must parse string back to events and inject into stream |
| 52 | +XMLEventReader bodyReader = xif.createXMLEventReader(new StringReader(bodyXml)); |
| 53 | +while (bodyReader.hasNext()) { |
| 54 | + xmlEventWriter.add(bodyReader.nextEvent()); |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +**Problem**: Requires string round-trip because Jackson cannot write to XMLEventWriter/XMLStreamWriter |
| 59 | + |
| 60 | +### BioC Module (Reading) |
| 61 | +**Pattern**: XMLStreamReader + Jackson XmlMapper |
| 62 | +```java |
| 63 | +XmlMapper mapper = new XmlMapper(); |
| 64 | + |
| 65 | +// Jackson supports XMLStreamReader (not XMLEventReader) |
| 66 | +var document = mapper.readValue(getXmlStreamReader(), BioCDocument.class); |
| 67 | +``` |
| 68 | + |
| 69 | +**Advantage**: Switched from XMLEventReader to XMLStreamReader - eliminated string buffering from an earlier bad approach |
| 70 | +**Limitation**: Required changing the streaming API from events to cursor-based |
| 71 | + |
| 72 | +## Jakarta JAXB Option |
| 73 | + |
| 74 | +### Migration Path |
| 75 | +- `javax.xml.bind` → `jakarta.xml.bind` |
| 76 | +- Package rename: `javax.xml.bind.*` → `jakarta.xml.bind.*` |
| 77 | +- Same API surface, just namespace change |
| 78 | + |
| 79 | +### Dependencies |
| 80 | +```xml |
| 81 | +<dependency> |
| 82 | + <groupId>jakarta.xml.bind</groupId> |
| 83 | + <artifactId>jakarta.xml.bind-api</artifactId> |
| 84 | + <version>4.0.2</version> |
| 85 | +</dependency> |
| 86 | +<dependency> |
| 87 | + <groupId>org.glassfish.jaxb</groupId> |
| 88 | + <artifactId>jaxb-runtime</artifactId> |
| 89 | + <version>4.0.5</version> |
| 90 | + <scope>runtime</scope> |
| 91 | +</dependency> |
| 92 | +``` |
| 93 | + |
| 94 | +## Comparison Matrix |
| 95 | + |
| 96 | +| Aspect | Jakarta JAXB | Jackson XML | Winner | |
| 97 | +|--------|-------------|-------------|---------| |
| 98 | +| **XCES Hybrid Writing** | ✅ Marshal directly to XMLEventWriter | ❌ Must round-trip through String | **JAXB** | |
| 99 | +| **BioC Streaming Reading** | ✅ Unmarshal from XMLEventReader | ✅ ReadValue from XMLStreamReader | **Tie** | |
| 100 | +| **Code Simplicity (XCES)** | Simple: 1 line marshal | Complex: 20+ lines string round-trip | **JAXB** | |
| 101 | +| **Code Simplicity (BioC)** | Simple: 1 line unmarshal | Simple: 1 line readValue | **Tie** | |
| 102 | +| **Memory Efficiency (XCES)** | ✅ Zero buffering | ❌ String buffering | **JAXB** | |
| 103 | +| **Memory Efficiency (BioC)** | ✅ Zero buffering | ✅ Zero buffering | **Tie** | |
| 104 | +| **Standardization** | Jakarta EE standard | De-facto JSON/XML library | **JAXB** | |
| 105 | +| **Maintenance** | Stable, mature | Active development | **Tie** | |
| 106 | +| **Learning Curve** | Known technology | New annotations | **JAXB** | |
| 107 | +| **Performance** | Mature, optimized | Mature, optimized | **Tie** | |
| 108 | +| **Annotations** | `@XmlElement`, `@XmlAttribute` | `@JsonProperty`, custom converters | **JAXB** | |
| 109 | +| **Migration Effort** | Package rename only | Complete rewrite (done) | **JAXB** | |
| 110 | + |
| 111 | +## Technical Deep Dive |
| 112 | + |
| 113 | +### Why JAXB Works Better for XCES |
| 114 | + |
| 115 | +The XCES format requires this structure: |
| 116 | +```xml |
| 117 | +<cesDoc> |
| 118 | + <cesHeader/> |
| 119 | + <text> |
| 120 | + <body> |
| 121 | + <!-- Content here via object mapping --> |
| 122 | + </body> |
| 123 | + </text> |
| 124 | +</cesDoc> |
| 125 | +``` |
| 126 | + |
| 127 | +**With JAXB**: The Marshaller can write directly to the middle of an XMLEventWriter stream: |
| 128 | +```java |
| 129 | +// Write structure manually |
| 130 | +xmlEventWriter.add(startElement("cesDoc")); |
| 131 | +xmlEventWriter.add(startElement("text")); |
| 132 | + |
| 133 | +// JAXB writes <body> directly to the same stream |
| 134 | +marshaller.marshal(bodyObject, xmlEventWriter); |
| 135 | + |
| 136 | +// Continue writing structure |
| 137 | +xmlEventWriter.add(endElement("text")); |
| 138 | +xmlEventWriter.add(endElement("cesDoc")); |
| 139 | +``` |
| 140 | + |
| 141 | +**With Jackson**: Cannot write to XMLEventWriter, must round-trip: |
| 142 | +```java |
| 143 | +// Write structure manually |
| 144 | +xmlEventWriter.add(startElement("cesDoc")); |
| 145 | +xmlEventWriter.add(startElement("text")); |
| 146 | + |
| 147 | +// Jackson → String → Events → XMLEventWriter |
| 148 | +String xml = xmlMapper.writeValueAsString(bodyObject); |
| 149 | +XMLEventReader events = createReader(new StringReader(xml)); |
| 150 | +while (events.hasNext()) { |
| 151 | + xmlEventWriter.add(events.nextEvent()); // Copy all events |
| 152 | +} |
| 153 | + |
| 154 | +// Continue writing structure |
| 155 | +xmlEventWriter.add(endElement("text")); |
| 156 | +xmlEventWriter.add(endElement("cesDoc")); |
| 157 | +``` |
| 158 | + |
| 159 | +### Why Both Work Similarly for BioC |
| 160 | + |
| 161 | +BioC reads multi-document collections and extracts individual documents: |
| 162 | + |
| 163 | +**With JAXB**: |
| 164 | +```java |
| 165 | +Unmarshaller unmarshaller = context.createUnmarshaller(); |
| 166 | +BioCDocument doc = unmarshaller.unmarshal(xmlEventReader, BioCDocument.class).getValue(); |
| 167 | +``` |
| 168 | + |
| 169 | +**With Jackson**: |
| 170 | +```java |
| 171 | +XmlMapper mapper = new XmlMapper(); |
| 172 | +BioCDocument doc = mapper.readValue(xmlStreamReader, BioCDocument.class); |
| 173 | +``` |
| 174 | + |
| 175 | +Both are clean, both stream efficiently. The main difference is XMLEventReader vs XMLStreamReader, which is just API preference. |
| 176 | + |
| 177 | +## Recommendation |
| 178 | + |
| 179 | +### ✅ **Switch to Jakarta JAXB** |
| 180 | + |
| 181 | +**Reasons**: |
| 182 | + |
| 183 | +1. **XCES hybrid approach is MUCH cleaner** - eliminates the string round-trip entirely |
| 184 | +2. **Same or better performance** - no string buffering in XCES writers |
| 185 | +3. **Simpler code** - JAXB Marshaller/Unmarshaller integrate seamlessly with StAX streaming |
| 186 | +4. **Less migration effort** - Jakarta JAXB is just a package rename from javax JAXB |
| 187 | +5. **Standard approach** - Jakarta EE is the successor to Java EE, this is the "official" migration path |
| 188 | +6. **Familiar API** - The model classes already have JAXB annotations (were there originally) |
| 189 | +7. **No architectural compromises** - JAXB was designed specifically for XML with StAX integration |
| 190 | + |
| 191 | +**Migration Effort**: |
| 192 | +- Change dependencies: `javax.xml.bind` → `jakarta.xml.bind` |
| 193 | +- Update imports: `javax.xml.bind.*` → `jakarta.xml.bind.*` |
| 194 | +- Revert code to original JAXB approach (simpler than current Jackson code!) |
| 195 | +- Tests should pass with minimal changes |
| 196 | + |
| 197 | +**Conclusion**: Jakarta JAXB is the better choice. It solves the original problem (Java module system compatibility) without introducing architectural compromises. Jackson XML is excellent for many use cases, but for these hybrid StAX scenarios, JAXB's native integration is superior. |
0 commit comments