PyMuPDF/docs/about.rst at d4edefd63528a8f328e91dcf311c23671a0b7c7e · pymupdf/PyMuPDF

Features Comparison

Feature Matrix

The following table illustrates how |PyMuPDF| compares with other typical solutions.

Note

A note about Office document types (DOCX, XLXS, PPTX) and Hangul documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.

There are some caveats:

we convert the input to HTML to layout the content.
because of this the original page separation has gone.

When saving out the result any faithful representation of the original layout cannot be expected.

Therefore input files are mostly in a form that's useful for text extraction.

PyMuPDF Product Suite

|PyMuPDF| is the standard version of the library, however there are a family of additional products each with different features and functionality.

Additional products in the |PyMuPDF| product suite are:

|PyMuPDF Pro| adds support for Office document formats.
|PyMuPDF4LLM| is optimized for large language model (LLM) applications, providing enhanced text extraction and processing capabilities.

It focuses on layout analysis and semantic understanding, ideal for document conversion and formatting tasks with enhanced results.

Note

All of the products above depend on the same core product - |PyMuPDF| and therefore have full access to all of its features. These additional products can be seen as optional extras to the enhance the core |PyMuPDF| library.

PyMuPDF Products Comparison

The following table illustrates what features the products offer:

PyMuPDF Products Comparison

	PyMuPDF	PyMuPDF Pro	PyMuPDF4LLM
Input Documents	PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, Images (standard document types)	as PyMuPDF and: DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX	as PyMuPDF
Output Documents	Can convert any input document to PDF, SVG or Image	as PyMuPDF	as PyMuPDF and: Markdown (MD), JSON or TXT
Page Analysis	Basic page analysis to return document structure	as PyMuPDF	Advanced Page Analysis with trained data for enhanced results
Data extraction	Basic data extraction with structured layout information and bounding box data	as PyMuPDF	Advanced data extraction including layout analysis with semantic understanding and enhanced bounding box data
Table extraction	Basic table extraction as part of text extraction	as PyMuPDF	Advanced table extraction with cell structure, including support for merged cells and complex layouts
Image extraction	Basic image extraction	as PyMuPDF	Advanced detection and rendering of image areas on page saving them to disk or embedding in MD output
Vector extraction	Vector extraction and clustering	as PyMuPDF	Superior detection of "picture" areas
Popular RAG Integrations	Langchain, LlamaIndex	as PyMuPDF	as PyMuPDF and with some additional help methods for RAG workflows
OCR	On-demand invocation of built-in Tesseract for text detection on pages or images	as PyMuPDF	Automatic OCR based on page content analysis. OCR adapators for popular OCR engines available

Performance

To benchmark |PyMuPDF| performance against a range of tasks a test suite with a fixed set of :ref:`8 PDFs with a total of 7,031 pages<Appendix4_Files_Used>` containing text & images is used to obtain performance timings.

Here are current results, grouped by task:

Note

For more detail regarding the methodology for these performance timings see: :ref:`Performance Comparison Methodology<Appendix4>`.

License and Copyright

|PyMuPDF| and |MuPDF| are now available under both, open-source |AGPL| and commercial license agreements. Please read the full text of the |AGPL| license agreement, available in the distribution material (file COPYING) and on the GNU license page, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the |AGPL|, please contact Artifex for more information regarding a commercial license.

Find out more about Licensing

Artifex is the exclusive commercial licensing agent for MuPDF.

Artifex, the Artifex logo, MuPDF, and the MuPDF logo are registered trademarks of Artifex Software Inc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features Comparison

Feature Matrix

PyMuPDF Product Suite

PyMuPDF Products Comparison

Performance

License and Copyright

FilesExpand file tree

about.rst

Latest commit

History

about.rst

File metadata and controls

Features Comparison

Feature Matrix

PyMuPDF Product Suite

PyMuPDF Products Comparison

Performance

License and Copyright