pdf2xml convertor based on Xpdf library (http://www.foolabs.com/xpdf/home.html). The component converts information contained in a PDF file into XML. First, you need to install xpdf and libxml2 (see documentation).
It supports convert PDF to xml with Text and Image.
pdftoxml version 1.0 (Based on Xpdf version 3.01, Copyright 1996-2005 Glyph & Cog, LLC) Copyright 2004-2006 XEROX XRCE Usage: pdftoxml [options][ ] -f : first page to convert -l : last page to convert -verbose : display pdf attributes -noText : do not extract textual objects -noImage : do not extract Images (Bitmap and Vectorial) -noImageInline : do not include images inline in the stream -outline : create an outline file xml -annots : create an annotations file xml -cutPages : cut all pages in separately files -blocks : add blocks informations whithin the structure -fullFontName : fonts names are not normalized -nsURI : add the specified namespace URI -opw : owner password (for encrypted files) -upw : user password (for encrypted files) -q : don't print any messages or errors -v : print copyright and version info -h : print usage information -help : print usage information --help : print usage information -? : print usage information