Using the pydocx command,
you can specify the output format
with the input and output files:
$ pydocx --html input.docx output.html
If you don't want to mess around
having to create exporters,
you can use the
PyDocX.to_html
helper method:
from pydocx import PyDocX
# Pass in a path
html = PyDocX.to_html('file.docx')
# Pass in a file object
html = PyDocX.to_html(open('file.docx', 'rb'))
# Pass in a file-like object
from cStringIO import StringIO
buf = StringIO()
with open('file.docx') as f:
buf.write(f.read())
html = PyDocX.to_html(buf)Of course, you can do the same using the exporter class:
from pydocx.export import PyDocXHTMLExporter
# Pass in a path
exporter = PyDocXHTMLExporter('file.docx')
html = exporter.export()
# Pass in a file object
exporter = PyDocXHTMLExporter(open('file.docx', 'rb'))
html = exporter.export()
# Pass in a file-like object
from cStringIO import StringIO
buf = StringIO()
with open('file.docx') as f:
buf.write(f.read())
exporter = PyDocXHTMLExporter(buf)
html = exporter.export()- tables
- nested tables
- rowspans
- colspans
- lists in tables
- lists
- list styles
- nested lists
- list of tables
- list of paragraphs
- justification
- images
- styles
- bold
- italics
- underline
- hyperlinks
- headings
The export class
pydocx.export.PyDocXHTMLExporter
relies on certain
CSS classes being defined
for certain behavior to occur.
Currently these include:
- class
pydocx-insert-> Turns the text green. - class
pydocx-delete-> Turns the text red and draws a line through the text. - class
pydocx-center-> Aligns the text to the center. - class
pydocx-right-> Aligns the text to the right. - class
pydocx-left-> Aligns the text to the left. - class
pydocx-comment-> Turns the text blue. - class
pydocx-underline-> Underlines the text. - class
pydocx-caps-> Makes all text uppercase. - class
pydocx-small-caps-> Makes all text uppercase, however truly lowercase letters will be small than their uppercase counterparts. - class
pydocx-strike-> Strike a line through. - class
pydocx-hidden-> Hide the text. - class
pydocx-tab-> Represents a tab within the document.
Additionally, several list styles are defined based off the attribute values listed at: http://officeopenxml.com/WPnumbering-numFmt.php
- class
pydocx-list-style-type-cardinalText-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-decimal-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-decimalEnclosedCircle-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-decimalEnclosedFullstop-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-decimalEnclosedParen-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-decimalZero-> (01, 02, 03, etc.) - class
pydocx-list-style-type-lowerLetter-> (a, b, c, etc.) - class
pydocx-list-style-type-lowerRoman-> (i, ii, iii, etc.) - class
pydocx-list-style-type-none-> List style is removed - class
pydocx-list-style-type-ordinalText-> (1, 2, 3, 4, etc.) - class
pydocx-list-style-type-upperLetter-> (A, B, C, etc.) - class
pydocx-list-style-type-upperRoman-> (I, II, III, etc.)
There is only one custom exception (MalformedDocxException).
It is raised if either the xml or zipfile libraries raise an exception.