Jbig2.exe fix a bug and release new update

After the first release, I found a bug of jbig2.exe,
when I tried
jbig2.exe -p 1.tiff>1.jbig2
I found the created jbig2 file under Windows is bigger than under Linux, and after check, it is not correct jbig2 file.
After some efforts(sorry for my poor c/c++ and python), I fixed the bug.

P.S.
Change logs of this release

*pdf.py create correct PDF to stdout under windows
for details, please visit How to Let Python Send Binary data to stdout under Windows
*fix a bug, that jbig2.exe -p can not get the correct content, also stdout issue.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

How to Let Python Send Binary data to stdout under Windows

recently I successfully compile jbig2enc under windows, but when I tested pdf.py, I got incorrect PDF document, after asked to the author,

because of the bug of python under windows, file(p).read() can not read all content of output.0000, so the pdf created is not correct, I modify file(p,’rb’).read(), but the pdf created is still not correct, is some bigger than it create under linux, I have send you the two pdf files as attachment.

I got the following answer,

My only guess would be that Windows is performing newline conversion. Keep your current change to use file(p, ‘rb’).read() and alter line 137 to read:
file(‘output.pdf’, ‘wb’).write(str(doc))
and see if output.pdf is any more valid.

in this way, I can get the correct PDF, but it is not the stdout way, and after some search, I got the following answer, and now it works as I want.

You want to send binary data, such as for an image, to stdout under Windows.

import sys

if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

note:
If you are reading or writing binary data under Windows, such as for an image, then the file must specifically be opened in binary mode (Unix doesn’t make a distinction between text and binary modes). But this is a problem for a program that wants to write binary data to standard output (as a web CGI program would be expected to do), since the ’sys’ module opens the ’stdout’ file object on your behalf and normally does so in text mode. You could have ’sys’ open ’stdout’ in binary mode instead by supplying the ‘-u’ command-line option to the Python interpreter. But if you want to control this mode from within a program, then (as shown in the code sample) you can use the ’setmode’ function provided by the Windows-specific ‘msvcrt’ module to change the mode of stdout’s underlying file descriptor.

reference,
http://code.activestate.com/recipes/65443/
issue: pdf.py can not create correct pdf under windows
Convert JBIG2 to PDF with free and open source software agl’s jbig2enc
Windows Version Jbig2enc(Jbig2 Encoder)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Three Open source PDF Parser developed in Python

PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.
for details, please visit Extract and Analyze Text Data of PDF Documents with PDFMiner

pdf-parser.py
This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document. The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done.
for details, please visit, PDF Tools

PyPDF

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, …),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without
any dependencies on external libraries. It can also work entirely on
StringIO objects rather than file streams, allowing for PDF
manipulation in memory. It is therefore a useful tool for websites
that manage or manipulate PDFs.

for details, please visit http://pybrary.net/pyPdf/

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride