Skip to content

Add support for UYAP UDF files#1620

Open
efekurucay wants to merge 1 commit intomicrosoft:mainfrom
efekurucay:main
Open

Add support for UYAP UDF files#1620
efekurucay wants to merge 1 commit intomicrosoft:mainfrom
efekurucay:main

Conversation

@efekurucay
Copy link

Summary

This PR adds native support for UYAP UDF files in MarkItDown.

It introduces a built-in UdfConverter that:

  • accepts .udf files directly
  • detects UDF archives from zip-like streams by checking for content.xml and a <template> root
  • parses rune-based text offsets from UDF XML
  • converts supported content directly to Markdown without adding new runtime dependencies

Supported output in this initial version

  • paragraphs
  • bold / italic / underline
  • numbered and bulleted lists
  • basic tables

For now, embedded images are represented as [embedded image omitted] rather than causing conversion to fail.

Tests

This PR adds:

  • focused unit tests for UDF detection, rune-based offsets, list behavior, empty paragraphs, nested table flattening, and image placeholder spacing
  • a .udf fixture integrated into the existing vector-based test suite

Notes

This keeps the initial implementation intentionally conservative and does not yet attempt higher-fidelity formatting such as color, font size, or alignment semantics.

@efekurucay
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant