Skip to content

Latest commit

 

History

History
444 lines (422 loc) · 35.8 KB

File metadata and controls

444 lines (422 loc) · 35.8 KB

Changelog

All notable changes to this project will be documented in this file.

[Unreleased] - .NET 8 / Open XML SDK 3.x Migration

Fixed (npm)

  • TypeScript subpath exports not resolving under moduleResolution: "node" (Issue #113) - Added typesVersions fallback to npm package.json so docxodus/react and docxodus/worker subpath imports resolve types correctly under all TypeScript module resolution modes. Also reordered export conditions to put types before import per TypeScript requirements.

Added

  • Incremental annotation overlay API (Issue #106) - Decouple HTML conversion from annotation projection to avoid full WASM re-conversion
    • ProjectAnnotationsOntoHtml() - Project a full annotation set onto already-converted HTML
    • AddAnnotationToHtml() - Add a single annotation to existing HTML without re-converting the document
    • RemoveAnnotationFromHtml() - Remove a single annotation by ID, unwrapping spans back to plain text
    • GenerateVisibilityCss() - Generate CSS to hide/show annotations by label ID for instant toggling
    • GenerateAnnotationCssString() - Generate annotation CSS separately for independent management
    • All methods available in .NET, WASM (JSExport), and npm TypeScript wrapper
    • CSS-based label filtering enables responsive toggle without any re-rendering

Fixed

  • Paginated rendering: text clipped at page bottom + inconsistent paragraph spacing (Issue #114)
    • Fixed lineRule default handling: when w:lineRule is absent but w:line is present, treat as "auto" per OOXML spec (ISO/IEC 29500). Previously the line value was ignored, causing accumulated line-height mismatches that clipped the last line on pages.
    • Fixed contextualSpacing handling: now suppresses both spacingAfter (margin-bottom) AND spacingBefore (margin-top) for consecutive same-style paragraphs. Previously only spacingAfter was suppressed, leaving inconsistent inter-paragraph gaps.
    • Fixed pagination engine bottom margin over-reservation: the last block's bottom margin is no longer counted against page space since it's invisible (clipped by overflow: hidden). This prevents premature page breaks where content would have been visible.
  • Annotation projection fails on sanitized HTML (Issue #110) - ProjectAnnotationsOntoHtml, AddAnnotationToHtml, and RemoveAnnotationFromHtml now handle HTML fragments with multiple root elements (e.g., DOMPurify-sanitized output) and HTML named entities ( , –, etc.)
    • Root cause: XElement.Parse() requires valid XML with a single root element; sanitized HTML strips <html>/<body> wrappers leaving multiple roots
    • Fix: Auto-wraps multi-root HTML in a synthetic container for parsing, unwraps on serialization; replaces common HTML entities with numeric XML equivalents
  • Table container missing top margin (Issue #108) - Tables preceded by paragraphs with no after-spacing now get a default margin-top: 7.5pt for visual separation
    • Also handles floating table spacing from w:tblpPr (topFromText/bottomFromText attributes)
    • Tables preceded by paragraphs with explicit after-spacing correctly skip the default margin
  • Move markup Word compatibility (Issue #96) - Documents with move operations no longer cause Word "unreadable content" warnings
    • Root cause: FixUpRevMarkIds() was overwriting IDs of w:del/w:ins after FixUpRevisionIds() had already assigned unique IDs, causing collisions with move element IDs
    • Fix: Removed redundant FixUpRevMarkIds() call - FixUpRevisionIds() already handles all revision element IDs correctly
    • Added SimplifyMoveMarkup setting to optionally convert move markup to simple w:del/w:ins if desired
    • Added comprehensive ID uniqueness tests to prevent regression
    • DetectMoves now defaults to true (move detection is safe to use)
  • Footnote/endnote numbering - Fixed footnotes and endnotes displaying raw XML IDs instead of sequential display numbers
    • Per ECMA-376, w:id is a reference identifier, not the display number
    • Added FootnoteNumberingTracker class to scan document and build XML ID → display number mapping
    • Footnotes/endnotes now render with sequential numbers (1, 2, 3...) based on document order
    • Also fixed footnote ordering in the footnotes section to match document order
    • Updated both regular and paginated rendering modes
    • See docs/ooxml_corner_cases.md for detailed documentation
  • Legal numbering continuation pattern - Fixed incorrect multi-level list numbering when items continue a flat sequence at different indentation levels
    • Documents with items like 1., 2., 3. at level 0 followed by item at level 1 (with start=4) now render as "4." instead of "3.4"
    • Added "continuation pattern" detection in ListItemRetriever.cs that recognizes when a deeper-level item continues a flat list
    • When detected, uses level 0's format string, run properties, and paragraph properties with the current counter value
    • Fixes underline appearing on continuation items when level 1's rPr has underline but level 0's doesn't
    • Fixes tab/indentation spacing to use level 0's tab stops and indentation for consistency
    • Updated FormattingAssembler.cs to use GetEffectiveLevel() in paragraph property stack and annotation functions
    • See docs/ooxml_corner_cases.md for detailed documentation of this edge case
  • Tab width calculation re-enabled in WmlToHtmlConverter for proper tab stop positioning
    • Previously disabled due to Azure font measurement failures; now uses estimation fallback
    • MetricsGetter._getTextWidth() returns character-based estimation when SkiaSharp measurement fails
    • Estimation formula: charWidth = fontSize * 0.6 / 2 per character (same as WASM builds)
    • Tab positioning now properly accounts for preceding text width
    • Works in Azure, WASM, and environments without fonts installed
    • Added Playwright visual tests for tab rendering verification
  • Thread-safety issues in WmlToHtmlConverter and FontFamilyHelper that could cause corruption during concurrent document conversions
    • ShadeCache in WmlToHtmlConverter now uses ConcurrentDictionary for thread-safe shade color caching
    • FontFamilyHelper._unknownFonts now uses ConcurrentDictionary for thread-safe font tracking
    • FontFamilyHelper.KnownFamilies now uses Lazy<T> for thread-safe lazy initialization
    • Added WmlToHtmlConverter.ClearShadeCache() and FontFamilyHelper.ClearUnknownFontsCache() methods for memory management in long-running processes

Breaking Changes

  • Target Framework: Changed from net45/net46/netstandard2.0 to .NET 8.0
  • Open XML SDK: Upgraded from 2.8.1 to 3.2.0
  • Graphics Library: Replaced System.Drawing with SkiaSharp 2.88.9

Added

  • Table Width DXA Support - Tables with DXA (twips) widths now render correctly
    • Previously, only percentage widths were handled; DXA widths were ignored
    • Tables with w:tblW[@w:type="dxa"] now render with proper width: XXpt CSS
    • Conversion uses standard formula: dxa / 20 = points
    • Addresses converter gaps #1 (Table Width Calculation)
  • Borderless Table Detection - Tables without borders now get semantic markup
    • Tables with w:tblBorders set to nil/none or missing get data-borderless="true" attribute
    • Useful for identifying layout tables vs data tables
    • Enables CSS-based styling for signature blocks and multi-column layouts
    • Addresses converter gaps #3 (Borderless Table Detection)
  • Document Language Attribute - HTML output now includes lang attribute for improved accessibility
    • New DocumentLanguage setting to manually override the language (default: auto-detect)
    • <html> element now includes lang attribute (e.g., <html lang="en-US">)
    • Language is auto-detected from:
      1. w:themeFontLang in document settings
      2. Default paragraph style's w:rPr/w:lang
      3. Falls back to "en-US"
    • Foreign text spans get lang attribute when different from document default
    • Improves screen reader pronunciation and browser font selection
    • Addresses converter gaps #10 (Document Language Attribute) and #11 (Foreign Text Spans)
  • Improved Font Fallback - Unknown fonts now get appropriate generic fallback, and CJK text gets language-specific font chains
    • Unknown fonts are classified by name patterns and get proper fallback:
      • Fonts with "sans" pattern → font-family: 'FontName', sans-serif
      • Fonts with "mono", "code", "courier" patterns → font-family: 'FontName', monospace
      • Other fonts default to serif fallback
    • Fixed Courier New and Lucida Console to include monospace fallback (was missing)
    • CJK (Chinese, Japanese, Korean) text gets language-specific font fallback chains:
      • Japanese (ja-JP): 'Noto Serif CJK JP', 'Yu Mincho', 'MS Mincho', ...
      • Simplified Chinese (zh-hans): 'Noto Serif CJK SC', 'Microsoft YaHei', 'SimSun', ...
      • Traditional Chinese (zh-hant): 'Noto Serif CJK TC', 'Microsoft JhengHei', 'PMingLiU', ...
      • Korean (ko): 'Noto Serif CJK KR', 'Malgun Gothic', 'Batang', ...
    • Addresses converter gaps #13 (Limited Font Fallback) and #14 (No CJK Font-Family Fallback Chain)
  • Theme Color Resolution - Document theme colors are now resolved to actual RGB values
    • New ResolveThemeColors setting (default: true) enables theme color resolution
    • Reads color scheme from theme1.xml (a:clrScheme element)
    • Supports all 12 theme colors: dk1, lt1, dk2, lt2, accent1-6, hlink, folHlink
    • Applies w:themeTint (lighten toward white) and w:themeShade (darken toward black) modifiers
    • Resolves w:themeColor in run colors, paragraph shading, cell shading, and fills
    • Falls back to explicit color value if theme color not found
    • Addresses converter gap #6 (Theme Colors Not Resolved)
  • @page CSS Rule - Optional CSS @page rule generation for print stylesheets
    • New GeneratePageCss setting (default: false) enables @page rule generation
    • Reads page dimensions from w:sectPr/w:pgSz and margins from w:sectPr/w:pgMar
    • Generates CSS @page { size: Xin Yin; margin: ... } rules
    • Supports US Letter, A4, and custom page sizes with proper inch conversions
    • Useful for print stylesheets and PDF generation
    • Addresses converter gap #1 (No Page/Document Setup CSS)
  • Unsupported Content Placeholders - Visual indicators for content that cannot be fully converted to HTML
    • New RenderUnsupportedContentPlaceholders setting (default: false for backward compatibility)
    • Supports these unsupported content types:
      • WMF/EMF images: Legacy Windows Metafile formats display [WMF IMAGE] / [EMF IMAGE]
      • SVG images: Scalable Vector Graphics display [SVG IMAGE]
      • Math equations (OMML): Office Math Markup displays [MATH]
      • Form fields: Checkboxes, text inputs, dropdowns display [CHECKBOX], [TEXT INPUT], [DROPDOWN]
      • Ruby annotations: East Asian text annotations display base text with [RUBY] marker
    • Placeholders are styled with CSS (color-coded by type) and include:
      • data-content-type attribute for the content type
      • data-element-name attribute for the XML element name
      • title attribute with descriptive tooltip
    • New TypeScript enum UnsupportedContentType for type-safe placeholder identification
    • See docs/architecture/unsupported_content_placeholders.md for full documentation
  • External Annotation System (Issue #57) - Store annotations externally without modifying the DOCX file
    • New ExternalAnnotationSet type extends OpenContractDocExport with document binding:
      • documentId: Unique identifier for the source document
      • documentHash: SHA256 hash for integrity validation
      • createdAt, updatedAt: ISO 8601 timestamps
      • textLabels, docLabelDefinitions: Label definitions keyed by ID
    • ExternalAnnotationManager static class provides core functionality:
      • ComputeDocumentHash(): SHA256 hash of document bytes
      • CreateAnnotationSet(): Create annotation set from document (wraps OpenContractExporter)
      • CreateAnnotation(): Create annotation from character offsets
      • CreateAnnotationFromSearch(): Create annotation by text search with occurrence index
      • FindTextOccurrences(): Find all occurrences of text in document
      • Validate(): Validate annotations against document (hash check + text verification)
      • SerializeToJson() / DeserializeFromJson(): JSON serialization
    • ExternalAnnotationProjector for HTML projection:
      • ProjectAnnotations(): Post-process HTML to wrap annotated text with styled spans
      • ConvertWithAnnotations(): Combined conversion + projection
      • Supports annotation labels (Above, Inline, Tooltip, None modes)
      • CSS generation with customizable class prefix
    • TypeScript/npm wrapper functions:
      • computeDocumentHash(): Get document hash for validation
      • createExternalAnnotationSet(): Create annotation set from DOCX
      • validateExternalAnnotations(): Validate annotations against document
      • convertDocxToHtmlWithExternalAnnotations(): Convert with annotations projected
      • searchTextOffsets(): Search for text occurrences in document
      • createAnnotation(), createAnnotationFromSearch(), findTextOccurrences(): Client-side helpers
    • Full type definitions: AnnotationLabel, ExternalAnnotationSet, ExternalAnnotationValidationResult, etc.
    • 21 unit tests covering hash computation, annotation creation, validation, serialization, and projection
  • OpenContracts Export Format (Issue #56) - Export documents to OpenContracts format for interoperability
    • New OpenContractExporter.Export() method for complete document export:
      • title: Document title from core properties
      • content: Complete document text (paragraphs, tables, headers, footers, footnotes, endnotes)
      • description: Optional document description
      • pageCount: Estimated page count
      • pawlsFileContent: PAWLS-format page layout with token positions
      • docLabels: Document-level labels
      • labelledText: Annotations including structural elements (sections, paragraphs, tables)
      • relationships: Parent-child relationships between annotations
    • Full text extraction ensures 100% text coverage:
      • Main body paragraphs and tables
      • Nested tables
      • Headers and footers
      • Footnotes and endnotes
      • Content controls (structured document tags)
    • PAWLS (Page-Aware Layout Segmentation) format for layout data:
      • Page boundary information (width, height, index)
      • Token positions (x, y, width, height, text)
      • Supports annotation targeting by character offset
    • Structural annotations automatically generated:
      • Section annotations with page dimensions
      • Paragraph annotations with text spans
      • Table annotations with content ranges
      • Parent-child relationships (section contains paragraphs)
    • TypeScript API: exportToOpenContract() function with full type definitions
    • WASM export: DocumentConverter.ExportToOpenContract()
    • Compatible with OpenContracts ecosystem for document analysis
    • New CLI tool: docx2oc - Command-line tool for OpenContracts export
      • Usage: docx2oc <input.docx> [output.json]
      • Default output: same filename with .oc extension
      • Installable as .NET tool: dotnet tool install --global Docx2OC
  • ReadyToRun and AOT Compilation - Performance optimizations to reduce cold-start times
    • .NET library: Added PublishReadyToRun for pre-compiled native code during publish
    • WASM: Added RunAOTCompilation for Release builds to pre-compile IL to WebAssembly
    • Eliminates JIT warmup overhead (~180ms savings on first conversion in .NET)
    • Provides consistent performance with no JIT variance in WASM
  • Lightweight WASM Image Handling - Images are now embedded as base64 data URIs without SkiaSharp native library
    • Removed SkiaSharp native WASM dependency (~15MB+ savings in bundle size when native lib excluded)
    • Images are passed through directly from DOCX using ImageBytes property
    • Dimensions come from document markup (EMUs), not image decoding
    • Browser natively decodes image formats (PNG, JPEG, GIF, etc.)
    • Fallback handling: If SkiaSharp decode fails, images still work via raw bytes
    • Added image handling tests for documents with embedded and hyperlinked images
  • Frame Yielding for UI Responsiveness (Issue #44 Phase 1) - WASM operations now yield to the browser before heavy work begins
    • All async functions in the npm wrapper (convertDocxToHtml, compareDocuments, compareDocumentsToHtml, getRevisions, addAnnotation, addAnnotationWithTarget, getDocumentStructure) automatically yield using double-requestAnimationFrame pattern
    • This allows React state updates (loading spinners, progress indicators) to paint before blocking WASM execution
    • Transparent to consumers - no API changes required
    • Gracefully skipped in non-browser environments (Node.js, SSR)
  • Web Worker Support for Non-blocking Operations (Issue #44 Phase 2) - Fully non-blocking WASM execution via Web Workers
    • New docxodus/worker export provides worker-based API: import { createWorkerDocxodus } from 'docxodus/worker'
    • Worker API mirrors main API: convertDocxToHtml, compareDocuments, compareDocumentsToHtml, getRevisions, getVersion
    • Main thread remains fully responsive during WASM execution - animations continue, user interactions work
    • Zero-copy transfer of document bytes via Transferable for optimal performance
    • Worker can be terminated when no longer needed
  • Document Metadata API for Lazy Loading (Issue #44 Phase 3) - Fast metadata extraction without full HTML rendering
    • New getDocumentMetadata() function returns document structure information:
      • sections: Array of section metadata with page dimensions and content ranges
      • totalParagraphs, totalTables: Document-wide content counts
      • hasFootnotes, hasEndnotes, hasComments, hasTrackedChanges: Feature detection
      • estimatedPageCount: Heuristic-based page count estimation
    • Section metadata includes:
      • Page dimensions: pageWidthPt, pageHeightPt, marginTopPt, etc. (all values in points, 1pt = 1/72 inch)
      • Content area: contentWidthPt, contentHeightPt
      • Header/footer heights: headerPt, footerPt
      • Content tracking: paragraphCount, tableCount, startParagraphIndex, endParagraphIndex
      • Header/footer presence: hasHeader, hasFooter, hasFirstPageHeader, hasEvenPageHeader, etc.
    • Available in main API, worker API, and raw WASM: DocumentConverter.GetDocumentMetadata()
    • Enables efficient lazy loading for paginated document viewing
    • Security: Maximum document size limit of 100MB to prevent memory exhaustion
    • Graceful handling of malformed documents and invalid header/footer references
    • Known limitation: Section breaks inside tables or text boxes are not detected (see #51)
  • Page Range Rendering for Virtual Scrolling (Issue #31 Phase 4) - Render specific page ranges for lazy loading
    • New RenderPageRange() method in WmlToHtmlConverter renders only specified pages
    • Page-to-block mapping uses heuristic-based estimation (paragraphs and tables per page)
    • HTML output includes pagination metadata via data attributes:
      • data-start-page, data-end-page: Requested page range
      • data-total-pages: Total estimated pages in document
      • data-start-block, data-end-block: Block index range for rendered content
      • data-block-index: Per-element block indices for tracking
    • WASM exports: DocumentConverter.RenderPageRange(), DocumentConverter.RenderPageRangeFull()
    • TypeScript wrapper: renderPageRange() with full options support
    • Worker proxy support: WorkerDocxodus.renderPageRange() for non-blocking execution
    • React components for virtual scrolling:
      • useVirtualPagination hook: Manages viewport-aware page loading with IntersectionObserver
      • VirtualPaginatedDocument component: Auto-renders visible pages plus configurable buffer
    • All existing converter options supported (tracked changes, comments, headers/footers, etc.)
    • Graceful handling of out-of-bounds page requests (internally clamped to valid range)
  • Custom Annotations - Full support for adding, removing, and rendering custom annotations on DOCX documents
    • AnnotationManager class for programmatic annotation CRUD operations:
      • AddAnnotation(): Add annotation by text search or paragraph range
      • RemoveAnnotation(): Remove annotation by ID
      • GetAnnotations(): Retrieve all annotations from a document
      • GetAnnotation(): Get a specific annotation by ID
      • HasAnnotations(): Check if document has any annotations
    • DocumentAnnotation class with properties:
      • Id: Unique annotation identifier
      • LabelId: Category/type identifier for grouping
      • Label: Human-readable label text
      • Color: Highlight color in hex format (e.g., "#FFEB3B")
      • Author: Optional author name
      • Created: Optional creation timestamp
      • Metadata: Custom key-value pairs
    • AnnotationRange class for specifying annotation targets:
      • FromSearch(text, occurrence): Find text by search
      • FromParagraphs(start, end): Span paragraph indices
    • Document Structure API for element-based annotation targeting:
      • DocumentStructureAnalyzer.Analyze(): Returns navigable tree of document elements
      • DocumentElement class with path-based IDs (e.g., doc/p-0, doc/tbl-0/tr-1/tc-2)
      • Supported element types: Document, Paragraph, Run, Table, TableRow, TableCell, TableColumn, Hyperlink, Image
      • TableColumnInfo for virtual column elements (columns aren't real OOXML elements)
    • AnnotationTarget class with flexible targeting modes:
      • Element(elementId): Target by element ID from structure analysis
      • Paragraph(index), ParagraphRange(start, end): Target by paragraph index
      • Run(paragraphIndex, runIndex): Target specific run
      • Table(index), TableRow(tableIndex, rowIndex): Target tables/rows
      • TableCell(tableIndex, rowIndex, cellIndex): Target specific cell
      • TableColumn(tableIndex, columnIndex): Metadata-only column annotation
      • TextSearch(text, occurrence): Search text globally
      • SearchInElement(elementId, text, occurrence): Search within specific element
    • WASM methods: GetDocumentStructure(), AddAnnotationWithTarget()
    • TypeScript helper functions: findElementById(), findElementsByType(), getParagraphs(), getTables(), getTableColumns()
    • TypeScript targeting factories: targetElement(), targetParagraph(), targetTableCell(), etc.
    • React useDocumentStructure hook with structure navigation helpers
    • Annotations stored as Custom XML Part in DOCX (non-destructive)
    • Bookmark-based text range marking for precise positioning
    • HTML rendering with configurable label modes:
      • AnnotationLabelMode.Above: Floating label above highlight
      • AnnotationLabelMode.Inline: Label at start of highlight
      • AnnotationLabelMode.Tooltip: Label shown on hover
      • AnnotationLabelMode.None: Highlight only, no label
    • New settings in WmlToHtmlConverterSettings:
      • RenderAnnotations: Enable/disable annotation rendering
      • AnnotationLabelMode: Select label display mode
      • AnnotationCssClassPrefix: Customize CSS class names (default: "annot-")
      • IncludeAnnotationMetadata: Include metadata in HTML data attributes
    • WASM/npm support:
      • getAnnotations(), addAnnotation(), removeAnnotation(), hasAnnotations() functions
      • Annotation, AddAnnotationRequest, AddAnnotationResponse, RemoveAnnotationResponse types
      • AnnotationLabelMode enum
      • ConversionOptions extended with annotation rendering options
    • React support:
      • useAnnotations hook for annotation state management
      • AnnotatedDocument component with click/hover event handling
      • useDocxodus hook extended with annotation methods
    • 20 .NET unit tests and 21 Playwright browser tests for full coverage (including 11 for element-based targeting)
  • Comment Rendering in HTML Converter - Full support for rendering Word document comments in HTML output
    • CommentRenderMode enum with three rendering modes:
      • EndnoteStyle (default): Comments rendered at end of document with bidirectional anchor links
      • Inline: Comments rendered as tooltips with title and data-comment attributes
      • Margin: Comments positioned in a flexbox-based margin column alongside content, with author/date headers and back-reference links
    • New settings in WmlToHtmlConverterSettings:
      • RenderComments: Enable/disable comment rendering
      • CommentRenderMode: Select rendering mode
      • CommentCssClassPrefix: Customize CSS class names (default: "comment-")
      • IncludeCommentMetadata: Include author/date in HTML output
    • Comment highlighting with configurable CSS classes
    • Full comment metadata support (author, date, initials)
    • Margin mode includes print-friendly CSS media queries
    • WASM/npm support via commentRenderMode parameter and TypeScript CommentRenderMode enum
  • WebAssembly NPM Package (docxodus) - Browser-based document comparison and HTML conversion
    • wasm/DocxodusWasm/ - .NET 8 WASM project with JSExport methods
    • npm/ - TypeScript wrapper with React hooks
    • Full document comparison (redlining) support in the browser
    • DOCX to HTML conversion
    • React hooks: useDocxodus, useConversion, useComparison
    • Build script: scripts/build-wasm.sh
  • Native Move Markup in WmlComparer - Produces Word-native move tracking markup (w:moveFrom/w:moveTo)
    • Compared documents now contain proper OpenXML move elements, not just w:del/w:ins
    • Move pairs linked via w:name attribute for Word compatibility
    • Range markers (w:moveFromRangeStart/w:moveFromRangeEnd, w:moveToRangeStart/w:moveToRangeEnd) properly paired
    • Microsoft Word shows moves in "Track Changes" panel as relocated content
    • New Moved value in WmlComparerRevisionType enum
    • New properties on WmlComparerRevision: MoveGroupId (links source/destination), IsMoveSource (true=from, false=to)
    • New settings in WmlComparerSettings:
      • DetectMoves: Enable/disable move detection (default: true)
      • MoveSimilarityThreshold: Jaccard similarity threshold 0.0-1.0 (default: 0.8)
      • MoveMinimumWordCount: Minimum words to consider for move (default: 3)
    • Uses word-level Jaccard similarity for accurate matching
    • Respects CaseInsensitive setting for similarity comparison
    • Full WASM/npm support with new TypeScript helpers:
      • RevisionType.Moved enum value
      • isMove(), isMoveSource(), isMoveDestination() type guards
      • findMovePair() function to find linked move revisions
      • moveGroupId and isMoveSource properties on Revision interface
  • Format Change Detection in WmlComparer - Detects and tracks formatting-only changes (w:rPrChange)
    • When text content is identical but formatting changes (bold, italic, font size, etc.), produces native Word format change markup
    • Compared documents now contain w:rPrChange elements that Microsoft Word recognizes in Track Changes
    • New FormatChanged value in WmlComparerRevisionType enum
    • New FormatChange property on WmlComparerRevision with:
      • OldProperties: Dictionary of original formatting properties
      • NewProperties: Dictionary of new formatting properties
      • ChangedPropertyNames: List of what changed (e.g., "bold", "italic", "fontSize")
    • New setting in WmlComparerSettings:
      • DetectFormatChanges: Enable/disable format change detection (default: true)
    • Full WASM/npm support with new TypeScript helpers:
      • RevisionType.FormatChanged enum value
      • isFormatChange() type guard
      • FormatChangeDetails interface with oldProperties, newProperties, changedPropertyNames
      • formatChange property on Revision interface
  • Improved Revision API - Better TypeScript support for the getRevisions() API
    • RevisionType enum with Inserted, Deleted, and Moved values for type-safe comparisons
    • isInsertion(), isDeletion(), isMove(), isMoveSource(), isMoveDestination() helper functions
    • findMovePair() function to find the matching revision for a move
    • Comprehensive JSDoc documentation on the Revision interface
    • All types are properly exported from the package
  • Paginated Headers and Footers - Headers/footers now render correctly with pagination enabled
    • When both RenderHeadersAndFooters and RenderPagination=Paginated are enabled, headers and footers appear on each page
    • Per-section header/footer support with section index tracking
    • First page headers/footers supported (when w:titlePg is set in document)
    • Even page headers/footers supported for different odd/even page layouts
    • Headers/footers rendered into hidden registry for client-side cloning per-page
    • New data attributes: data-header-height, data-footer-height on section elements
    • TypeScript PageDimensions interface extended with headerHeight and footerHeight
    • CSS classes .page-header and .page-footer for positioning within page boxes
    • Automatic hiding of system page number when document has footer content
    • See docs/architecture/paginated_headers_footers.md for full architecture details
  • Per-page Footnote Rendering - Footnotes now appear at the bottom of each page where they are referenced
    • When RenderFootnotesAndEndnotes=true with RenderPagination=Paginated, footnotes are distributed per-page
    • Footnote registry stores footnotes in a hidden container for client-side distribution
    • data-footnote-id attributes added to footnote references for tracking
    • Single-pass, forward-only pagination algorithm (lazy-loading compatible)
    • Pagination engine measures footnote space and includes it in page layout calculations
    • Footnotes render with separator line (<hr>) above them
    • Footnote continuation: Long footnotes that don't fit on a page are split at paragraph boundaries and continue on subsequent pages (matching Word/Office behavior)
    • Dynamic footnote area expansion: Footnote area can expand upward into body content space (up to 60% of page height) to fit more footnote content before splitting, reducing wasted space
    • Endnotes remain at document end (not per-page) - traditional behavior preserved
    • New TypeScript methods: parseFootnoteRegistry(), extractFootnoteRefs(), measureFootnotesHeight(), addPageFootnotes(), splitFootnoteToFit(), measureContinuationHeight()
    • New TypeScript interfaces: FootnoteContinuation, PartialFootnote
    • New TypeScript constants: MAX_FOOTNOTE_AREA_RATIO (0.6), MIN_BODY_CONTENT_HEIGHT (72pt)
    • New CSS classes: .page-footnotes, .footnote-item, .footnote-number, .footnote-content, .footnote-continuation
  • SkiaSharpHelpers.cs - Color utilities for SkiaSharp compatibility
  • GetPackage() extension method in PtOpenXmlUtil.cs for SDK 3.x Package access
  • SkiaSharp.NativeAssets.Linux.NoDependencies package for Linux runtime support

Fixed

  • React hooks loading state not rendering before WASM blocks (Issue #45) - Fixed isConverting/isComparing/isLoading states in React hooks not painting before WASM execution blocks the main thread. Added requestAnimationFrame yielding after state updates in:

    • useConversion: convert() function
    • useComparison: compare() and compareToHtml() functions
    • useAnnotations: reload(), add(), and remove() functions
    • useDocumentStructure: reload() function
  • Header/footer positioning in paginated mode - Fixed headers and footers overlapping with body content. Headers now properly constrain to the top margin area (height: marginTop) and footers constrain to the bottom margin area (height: marginBottom). Uses flexbox layout for proper content alignment within constrained areas.

  • DocumentBuilder relationship copying - Fixed bug where relationship IDs from source documents could incorrectly match existing IDs in target header/footer parts when using InsertId functionality. This caused validation errors like "The relationship 'rIdX' referenced by attribute 'r:embed' does not exist."

    • Removed flawed early-return optimization in CopyRelatedImage() that skipped processing when target part had matching relationship ID
    • Fixed diagram relationship handling (R.dm, R.lo, R.qs, R.cs attributes) to properly copy parts from source documents
    • Fixed chart and user shape relationship handling
    • Fixed OLE object relationship handling
    • Fixed external relationship attribute update to use correct attribute name parameter
  • SpreadsheetWriter date handling - Fixed date cells being written with invalid ISO 8601 string format. Dates are now properly converted to Excel serial date numbers (days since December 30, 1899) which is required for transitional OOXML format.

  • WmlComparer null Unid handling - Fixed null reference exceptions when comparing documents with elements lacking Unid attributes.

  • WmlComparer footnote/endnote comparison (6 tests: WC-1660, WC-1670, WC-1710, WC-1720, WC-1750, WC-1760) - Fixed AssignUnidToAllElements to assign Unid to footnote/endnote elements themselves, enabling proper reconstruction of multi-paragraph footnotes/endnotes by CoalesceRecurse.

  • WmlComparer table row comparison (1 test: WC-1500) - Added LCS-based row matching (ApplyLcsToTableRows) for large tables (7+ rows) when content differs significantly, preventing cascading false differences from insertions/deletions in the middle of tables.

  • WASM CDN loading CORS issue - Fixed cross-origin loading failures when WASM files are served from CDNs (jsDelivr, unpkg). The .NET WASM runtime uses credentials:"same-origin" for fetch requests, which conflicts with CDN's Access-Control-Allow-Origin: * wildcard header. Build script now patches dotnet.js to use credentials:"omit" for CDN compatibility.

  • Vite bundler compatibility - Added @vite-ignore comment to dynamic import in npm/src/index.ts to prevent Vite from trying to analyze/resolve the WASM loader path during development builds.

  • Pagination content overflow - Fixed content overflowing page boundaries in the paginated view. The issue was caused by applying CSS transform scale to the content area while using inconsistent coordinate systems for positioning. The fix applies the scale transform to the entire page box instead, ensuring proper clipping and consistent scaling of all page elements.

  • WmlComparer legal numbering preservation (Issue #1634) - Fixed comparison losing legal numbering (w:isLgl) when comparing documents with different numbering styles. The comparer now properly merges numbering definitions from the revised document into the result:

    • Copies abstractNum and num elements from revised document when missing in original
    • Reuses existing definitions when content matches (regardless of ID)
    • Remaps IDs when conflicts occur to avoid duplicates
  • WmlToHtmlConverter null rPr crash - Fixed InvalidOperationException crash in DefineRunStyle and GetLangAttribute when converting runs without w:rPr elements. Changed .First() to .FirstOrDefault() with null checks to handle runs that have no explicit run properties gracefully.

Changed

  • Replaced FontPartType/ImagePartType with PartTypeInfo pattern for SDK 3.x compatibility
  • Replaced .Close() calls with Dispose() pattern
  • Migrated all color handling from System.Drawing.Color to SKColor
  • Migrated font handling from FontFamily/FontStyle to SKFontManager/SKTypeface
  • Migrated image handling from Bitmap/ImageFormat to SKBitmap/SKEncodedImageFormat

Documentation

  • Updated docs/architecture/wml_to_html_converter_gaps.md with comprehensive gap analysis including pagination mode limitations, DrawingML text handling, and prioritized fix recommendations

Test Status

  • 1051 passed, 0 failed, 1 skipped out of 1052 tests (~99.9% pass rate)
  • Header/footer and footnote pagination changes tested via manual integration testing