This page has moved to mike.zwobble.org/projects/mammoth/.
Mammoth converts .docx documents, such as those created by Microsoft Word, to HTML.
Mammoth aims to produce simple and clean HTML by using semantic information in the document,
and ignoring other details.
For instance, Mammoth converts any paragraph with the style Heading1
to h1
elements,
rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading.
If you've defined your own styles in your document,
then Mammoth allows you to map those styles to appropriate HTML.
There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.
Development of Mammoth is ongoing, so support for features is added as necessary. Contributions are welcome!
There's both a JavaScript implementation (browser and node.js) and a Python implementation. The README of each project contains details on installation and usage. You can also install Mammoth as a WordPress plugin.
Try it out by uploading a .docx file.
This will run Mammoth using the default style mappings,
so it will only expect standard Word styles such as Heading1
.
Don't have a document handy? Try this example from Microsoft. Headings and images work well, but the table of contents shows that Mammoth still has a way to go!
Select a .docx file: