Archive for October, 2009

Two ways to view PDF on PSP

First, Convert every PDF page to image(for example jpeg), and view Jpeg on PSP,
PDF2PSP is a tool for converting PDF documents and print outs into JPEG images suitable for displaying on the Sony PSP handheld game console.
Second, directly View the really PDF on PSP, and you need Bookr,A document reader for the Sony PSP with native PDF rendering.
To use Bookr, you need a Sony PSP with firmware version 1.5, other firmware versions are not supported.
To install Bookr, simply copy the __SCE__bookr and %__SCE__bookr folders to your Memory Stick \PSP\GAME folder.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Three Open source PDF Parser developed in Python

PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.
for details, please visit Extract and Analyze Text Data of PDF Documents with PDFMiner

pdf-parser.py
This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document. The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done.
for details, please visit, PDF Tools

PyPDF

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, …),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without
any dependencies on external libraries. It can also work entirely on
StringIO objects rather than file streams, allowing for PDF
manipulation in memory. It is therefore a useful tool for websites
that manage or manipulate PDFs.

for details, please visit http://pybrary.net/pyPdf/

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Extract and Analyze Text Data of PDF Documents with PDFMiner

PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.
The features of PDFMiner are as follow,

  • Written entirely in Python. (for version 2.4 or newer)
  • PDF-1.7 specification support. (well, almost)
  • Non-ASCII languages and vertical writing scripts support.
  • Various font types (Type1, TrueType, Type3, and CID) support.
  • Basic encryption (RC4) support.
  • PDF to HTML conversion (with a sample converter web app).
  • Outline (TOC) extraction.
  • Tagged contents extraction.
  • Infer text running by using clustering technique.

PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py,
pdf2txt.py extracts text contents from a PDF file. It extracts all the texts that are to be rendered programmatically, It cannot recognize texts drawn as images that would require optical character recognition. It also extracts the corresponding locations, font names, font sizes, writing direction (horizontal or vertical) for each text portion. You need to provide a password for protected PDF documents when its access is restricted. You cannot extract any text from a PDF document which does not have extraction permission.

For non-ASCII languages, you can specify the output encoding (such as UTF-8).
dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format. This program is primarily for debugging purpose, but it’s also possible to extract some meaningful contents (such as images).
To download the last version PDFminer or for details, please visit http://www.unixuser.org/~euske/python/pdfminer/index.html

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride