Archive for the 'Linux' Category

pdfsizeopt-a Free and Open Source PDF Manipulation Tool to Reduce PDF File Size

pdfsizeopt is open source project hosting on Google Code, the main feature is PDF file size optimizer.

About

pdfsizeopt is a collection of best practices and scripts for Unix to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is developed on a Linux system, and it depends on existing tools such as Python 2.4, Ghostscript 8.50, jbig2enc (optional), sam2p, pngtopnm, pngout (optional), and the Multivalent PDF compressor (optional) written in Java.

The author says it is A Linux solution, and I have test it on my DreamHost, it works. I have tried a PDF, the original PDF is 5.6M, and the optimized/converted PDF is 4.4M, great!

Another great thing, I am working on port it to windows, and all tools needed are ready(some download from website, some compiled by myself, for example jbig2), and have successfully modified pdfsizeopt.py to let it work under windows now, though it still has many bugs(I have submit them to the author) and I will release it later.

Installation instructions

Please note that not all the software mentioned in the instructions below is free software (if we consider freedom). Details:

  • pdfsizeopt: free
  • Python: free
  • Ghostscript: free version available
  • Java: free version available (OpenJDK)
  • sam2p: free
  • jbig2: free (http://github.com/agl/jbig2enc/tree/master)
  • png22pnm: free
  • pngtopnm: free
  • Multivalent.jar: not free software, but you don’t have to pay for using it, and you can download it from the official web site without having to pay
  • PNGOUT: not free software, but you don’t have to pay for using it, and you can download it from the official web site without having to pay

Necessary:

  1. A Unix system is needed, Linux is recommended. The following instructions have been tested on Debian Etch and Ubuntu Hardy.
  2. Install Python 2.4, Python 2.5 or Python 2.6 from package. Earlier or later versions won’t work.
  3. Install Ghostscript 8.61 or later. (You may try pdfsizeopt with Ghostscript 8.54 as well, but 8.54 has some known font conversion problems, so it will produce an error for some PDF files.) Earlier versions won’t work. Make sure the command gs is on your $PATH.
  4. Create a directory named pdfsizeopt.
  5. Check out the source code at http://code.google.com/p/pdfsizeopt/source/checkout , or just download http://pdfsizeopt.googlecode.com/svn/trunk/pdfsizeopt.py as pdfsizeopt/pdfsizeopt.py.
  6. Install a recent sam2p and copy the binary to pdfsizeopt/sam2p. For Linux, the recommended binary is http://pdfsizeopt.googlecode.com/files/sam2p . Please note that the sam2p in Ubuntu Intrepid and Debian Etch is too old. Either compile it yourself, or use the recommended download above.
  7. Install pngtopnm from package, or download the Linux binary from http://pdfsizeopt.googlecode.com/files/png22pnm to pdfsizeopt/png22pnm.

Optional, but strongly recommended:

  1. Install Java 1.5 or newer from package. javac is not necessary. Sun’s Java and OpenJDK are OK, gcj and gij won’t work. Make sure that java -version works and prints something at least 1.5.
  2. Download Multivalent*.jar from http://sourceforge.net/project/showfiles.php?group_id=44509&package_id=37068 (example: Multivalent20060102.jar), and copy it to pdfsizeopt/Multivalent.jar.
  3. Compile jbig2 for yourself, or download the Linux binary from http://pdfsizeopt.googlecode.com/files/jbig2 to pdfsizeopt/jbig2.

Optional, but recommended:

  1. Download the PNGOUT binary for your system. Recommended for Linux: the http://static.jonof.id.au/dl/kenutils/pngout-20070430-linux-static.tar.gz archive on http://www.jonof.id.au/kenutils . For other PNGOUT downloads, visit http://advsys.net/ken/utils.htm . Copy the file pngout-*-linux-static to pdfsizeopt/pngout.

Try it:

  1. Create a file test.pdf, and run pdfsizeopt.py --use-pngout=true --use-jbig2=true --use-multivalent=true test.pdf. The output file will be test.pso.pdf.
  2. If you haven’t installed some of the tools above, try changing =true to =false in the command line.

references,
pdfsizeopt home page
Convert JBIG2 to PDF with free and open source software agl’s jbig2enc
Windows version JBIG2 Encoder-Jbig2.exe

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Pdfedit is a Free and Open Source PDF Editor and Support Multiple Platform

PDFedit is free and open source library for manipulating PDF documents, released under terms of GNU GPL version 2. It includes PDF manipulating library based on xpdf, GUI and set of command line tools.
Free editor for PDF documents. Complete editing of PDF documents is possible with PDFedit. You can change raw pdf objects (for advanced users) or use many gui functions. Functionality can be easily extended using a scripting language (ECMAScript).
Multiple platform library working on Unix systems, Windows32/64 and also Windows CE and others. You can use it to read, change and extract information from a PDF file. It is based on xpdf library.
for details, please visit http://pdfedit.petricek.net/en/index.html or http://sourceforge.net/projects/pdfedit/
btw, PDFedit is also a PDF reader.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Convert JBIG2 to PDF with free and open source software agl’s jbig2enc

agl’s jbig2enc is an encoder for JBIG2:
www.jpeg.org/public/fcd14492.pdf

JBIG2 encodes bi-level (1 bpp) images using a number of clever tricks to get
better compression than G4. This encoder can:
* Generate JBIG2 files, or fragments for embedding in PDFs
* Generic region encoding
* Perform symbol extraction, classification and text region coding
* Perform refinement coding and,
* Compress multipage documents

It uses the (Apache-ish licensed) Leptonica library:
http://www.leptonica.com/

Jbig2enc can convert other format image to jbig2 or fragments for embedding in PDFs, and pdf.py can convert the fragments to PDF document.
for example, if you want to convert a.bmp to pdf, you can do it in this way
jbig2 -s -p a.bmp
you get output.0000 and output.sym files
python pdf.py output>jbig2.pdf
btw, itext and itextsharp also supports convert jbig2 to pdf now.
After some efforts, I have successfully compiled jbig2 under Window XP with GCC 4.2(MingW32+MSYS), for details and download, please visit Windows version of agl’s jbig2enc, from this page, you can also download the linux compiled version.

reference,
Windows version of agl’s jbig2enc
agl’s jbig2enc home page

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride