pdftohtml, convert pdf to html and xml,even excel

pdftohtml is a utility which converts PDF files into HTML and XML formats. It bases on XPDF. And it is open source and written in C++ .

Usage: pdftohtml [options] [ ]
-f : first page to convert
-l : last page to convert
-q : don’t print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc : output text encoding name
-dev : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)

I have even use it to generate Excel from PDF, converting a 927 PDF file to Excel document.
btw, it supports windows, linux, mac OSX and so on.

3 thoughts on “pdftohtml, convert pdf to html and xml,even excel”

Appan Ponnappan says:

February 3, 2007 at 11:37 am

pdftohtml DOES NOT do a good job of converting the tables in the source PDF document. I think if one want to judge how good any PDF converter is, it depends on how many different PDF elements that it can process. I guess most of them don’t do that & extracts just the text (without the styles or relations positions, etc.), which is easy to do!
rubypdf says:

February 3, 2007 at 12:10 pm

Sorry, I do not what you mean, I just say pdftohtml can help me to do some job, such as convert pdf to html, xml, and even excel.
of course even Acrobat professional 8 can not convert pdf to html or doc very well.
Appan Ponnappan says:

February 3, 2007 at 12:25 pm

I didn’t use the -c option without which it was converting only to text. But with this option, it converts to elements, so you get at least the look & feel of a table. I was to some extent correct in saying that it WAS NOT converting to HTML tables as such. Figures with boxes & arrows convert only to texts in boxes or arrow labels, but that’s the max we can do in HTML for figures, I guess!

You must be logged in to post a comment.

pdftohtml, convert pdf to html and xml,even excel

3 thoughts on “pdftohtml, convert pdf to html and xml,even excel”

Leave a Reply