RubyPDF Blog English,Open Source,PDF pdftohtml, convert pdf to html and xml,even excel

pdftohtml, convert pdf to html and xml,even excel

pdftohtml is a utility which converts PDF files into HTML and XML formats. It bases on XPDF. And it is open source and written in C++ .

Usage: pdftohtml [options] [ ]
-f : first page to convert
-l : last page to convert
-q : don’t print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc : output text encoding name
-dev : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)

I have even use it to generate Excel from PDF, converting a 927 PDF file to Excel document.
btw, it supports windows, linux, mac OSX and so on.

3 thoughts on “pdftohtml, convert pdf to html and xml,even excel”

  1. pdftohtml DOES NOT do a good job of converting the tables in the source PDF document. I think if one want to judge how good any PDF converter is, it depends on how many different PDF elements that it can process. I guess most of them don’t do that & extracts just the text (without the styles or relations positions, etc.), which is easy to do!

  2. Sorry, I do not what you mean, I just say pdftohtml can help me to do some job, such as convert pdf to html, xml, and even excel.
    of course even Acrobat professional 8 can not convert pdf to html or doc very well.

  3. I didn’t use the -c option without which it was converting only to text. But with this option, it converts to elements, so you get at least the look & feel of a table. I was to some extent correct in saying that it WAS NOT converting to HTML tables as such. Figures with boxes & arrows convert only to texts in boxes or arrow labels, but that’s the max we can do in HTML for figures, I guess!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.