Description

The script rtf2xml converts Microsoft's proprietary rich text format (RTF) format to XML. It preserves as much information the RTF files as possible, giving an XML author the choice of what elements to use for further transformations.

The raw XML file that results from an rtf2xml transformation seeks to be identical to the original file--with the exception that the transformation is structured XML. Ideally, only one could use rtf2xml and XSLT to change a file to and from RTF with not data loss.

Use this script to transform RTF to a more open and usuable such as XHTML or FO. You can find some stylesheets on the download page for these purposes, though they are far from perfect.

Caveats

  • Won't properly convert legacy RTF with multi-byte representations.

    The script rtf2xml will convert older RTF that has 8-bit representations, which includes most (all?) of the languages of Europe. However, rtf2xml cannot convert older files in the Japanese or Chinese language. It can convert newer files in these languages, but the older RTF gives no unicode markings, making conversion impossible.

  • Will often misrepresent RTF produced with Open Office.

    Open Office RTF produces some characters as double question marks (??). Other RTF readers can filter out these charcters, but the script rtf2xml cannot, and in my opinion, these double question marks do not follow Microsoft's guidelines.

  • Won't convert pictures.

    See the use section on what rtf2xml does with pictures.