pdftohtml, convert pdf to html and xml,even excel

pdftohtml is a utility which converts PDF files into HTML and XML formats. It bases on XPDF. And it is open source and written in C++ .

Usage: pdftohtml [options] [ ]
-f : first page to convert
-l : last page to convert
-q : don’t print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc : output text encoding name
-dev : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)

I have even use it to generate Excel from PDF, converting a 927 PDF file to Excel document.
btw, it supports windows, linux, mac OSX and so on.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

3 Responses to “pdftohtml, convert pdf to html and xml,even excel

  • 1
    Appan Ponnappan
    February 3rd, 2007 11:37

    pdftohtml DOES NOT do a good job of converting the tables in the source PDF document. I think if one want to judge how good any PDF converter is, it depends on how many different PDF elements that it can process. I guess most of them don’t do that & extracts just the text (without the styles or relations positions, etc.), which is easy to do!

  • 2
    rubypdf
    February 3rd, 2007 12:10

    Sorry, I do not what you mean, I just say pdftohtml can help me to do some job, such as convert pdf to html, xml, and even excel.
    of course even Acrobat professional 8 can not convert pdf to html or doc very well.

  • 3
    Appan Ponnappan
    February 3rd, 2007 12:25

    I didn’t use the -c option without which it was converting only to text. But with this option, it converts to elements, so you get at least the look & feel of a table. I was to some extent correct in saying that it WAS NOT converting to HTML tables as such. Figures with boxes & arrows convert only to texts in boxes or arrow labels, but that’s the max we can do in HTML for figures, I guess!

Leave a Reply

You must be logged in to post a comment.