pdf2xml, another tool not introduced by convert pdf to xml
pdf2xml convertor based on Xpdf library (http://www.foolabs.com/xpdf/home.html). The component converts information contained in a PDF file into XML. First, you need to install xpdf and libxml2 (see documentation).
It supports convert PDF to xml with Text and Image.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
pdftoxml version 1.0 (Based on Xpdf version 3.01, Copyright 1996-2005 Glyph & Cog, LLC) Copyright 2004-2006 XEROX XRCE Usage: pdftoxml [options] <PDF-file> [<xml-file>] -f <int> : first page to convert -l <int> : last page to convert -verbose : display pdf attributes -noText : do not extract textual objects -noImage : do not extract Images (Bitmap and Vectorial) -noImageInline : do not include images inline in the stream -outline : create an outline file xml -annots : create an annotations file xml -cutPages : cut all pages in separately files -blocks : add blocks informations whithin the structure -fullFontName : fonts names are not normalized -nsURI <string> : add the specified namespace URI -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) -q : don't print any messages or errors -v : print copyright and version info -h : print usage information -help : print usage information --help : print usage information -? : print usage information |
Leave a Reply
You must be logged in to post a comment.