Archive for October 12th, 2006

Generate PDF documents from a HTML page using ASP.NET(Author:Albert Pascual)

Introduction

This project uses an HTML to PDF exe from ESP. Please read the GNU license agreement for more information. HTMLDOC is a desktop application to create PDF documents from a HTML page. I wrote some code to use it from a web application. The best used is from a Web Report to add a PRINT to PDF button to use the C# class.
Using the code

    public string Run(string sRawUrl)
    {
        string sFileName = GetNewName();
        string sPage = Server.MapPath("" + sFileName + ".html");
        string sUrlVirtual = sRawUrl;
        StringWriter sw = new StringWriter();

        Server.Execute(sUrlVirtual, sw);

        StreamWriter sWriter = File.CreateText(sPage);
        sWriter.WriteLine(sw.ToString());
        sWriter.Close();    

        System.Diagnostics.Process pProcess
                             = new System.Diagnostics.Process();
        pProcess.StartInfo.FileName = m_sDrive + ":" + m_Directory +
                                            "\\ghtmldoc.exe";
        pProcess.StartInfo.Arguments = "--webpage --quiet " + sFontSize +
                  m_sWaterMark + " --bodyfont Arial " + sLandScape +
                  " -t pdf14 -f " + sFileName + ".pdf " + sFileName + ".html";
        pProcess.StartInfo.WorkingDirectory = m_sDrive + ":" + m_Directory;

        pProcess.Start();            

        return(sFileName + ".pdf");
    }

The class PDFGenerator contains a public method called Run that will call the process hghtmldoc.exe with the arguments you choose. The most important part is to set a working directory where the Web application has permission to read, write and execute, otherwise the program won’t work, and the function pProcess.Start will raise a Win32 Exception “access denied”.

StreamWriter will save the page into a HTML file on the hard disk.

The file DisplayPDF.aspx and DisplayPDF.aspx.cs will do just that, displays the generated PDF file when ready.


        private void Page_Load(object sender, System.EventArgs e)
        {

            if ( Request.Params["File"] != null )
            {
                bool bRet = false;
                int iTimeout = 0;
                while ( bRet == false )
                {
                    bRet = CheckIfFileExist(Request.Params["File"].ToString());
                    Thread.Sleep(1000);
                    iTimeout++;
                    if ( iTimeout == 10 )
                        break;
                }

                if ( bRet == true )
                {
                    Response.ClearContent();
                    Response.ClearHeaders();
                    Response.ContentType = "Application/pdf";
                    try
                    {
                        Response.WriteFile( MapPath( "" +
                                      Request.Params["File"].ToString() ) );
                        Response.Flush();
                        Response.Close();
                    }
                    catch
                    {
                        Response.ClearContent();
                    }

                }
                else
                {
                    if ( Request.Params["Msg"] != null )
                    {
                        LabelMsg.Text = Request.Params["Msg"].ToString();
                    }
                }
            }
        }

The page accepts a parameter, FILE, previously saved in the hard disk by StreamWriter. The Response.Redirect will include application/PDF, so the browser knows what kind of file is downloading and ask you to SAVE or OPEN. If you have Adobe plug-in installed on your browser, you’ll be able to see the PDF from your browser.
Points of Interest

It is important you create a directory to save the HTML file and generate the PDF, also give that directory all the permissions you need to run the EXE.
History

Remember the HTMLDOC is copywrite ESP. Please go to this link and download the latest version. The GNU license agreement is included in the project.

from: http://www.codeproject.com/aspnet/HTML2PDF.asp

source code of HtmlToPDF

Sponsored Links


Taking the time to learn HTML is often worth the effort, even for non-developers.
Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride

Pdfizer, a dumb HTML to PDF converter, in C#

Introduction

This article presents a basic HTML to PDF converter: with this library, you can transform simple HTML pages to nice and printable PDF files.

The HTML cleaning is done with NTidy (see [1]), a .NET wrapper for the HTML Tidy library (see [2]). The PDF generation is done with iTextSharp, a PDF generation library (see [3]).
Transformation Pipe

Transforming HTML documents to PDF is a fairly complex task. Hopefully, there exists powerful tools on the web that could help me accomplish this.
Parsing HTML

The first problem to handle was that HTML is usually “dirty”: the structure is usually not XML conformant and trying to parse HTML pages with the XmlDocument will usually lead to a failure.

To overcome this problem, I had to write a .NET wrapper around HTML Tidy (see [2]). HTML Tidy is a very useful application that takes “dirty” HTML and returns it cleaned as much as possible. The .NET wrapper exposes a DOM-like class structure so that you can use it much like XmlDocument.

Hence, with NTidy, we can safely parse HTML document.
Creating PDF

The PDF creation is done by iTextSharp (see [3]), a .NET library hosted on SourceForge, that gives you the tool to create PDF easily. Hence, the PDF creation problem is solved.
Reading, Traversing

With NTidy and iTextSharp on my toolset, I could start to create the generator. The generator works like this: it first reads the input with NTidy, then traverses the DOM tree and generates the PDF fragments with iTextSharp.
Quick Example

The library usage is done through the HtmlToPdfConverter class. Creating a PDF file is done through the following steps, as illustrated in the example:

1. Create a converter,
2. Open a new PDF file using the Open method,
3. Add a chapter,
4. Feed HTML to the converter,
5. If you want another chapter, go to 3.
6. When finished, close the PDF file by calling Close.

// create converter
HtmlToPdfConverter html2pdf = new HtmlToPdfConverter();

// open new pdf file
html2pdf.Open(@"test");
// start a chapter
html2pdf.AddChapter(@"Dummy Chapter");
string html = ...;
// convert string
html2pdf.Run(html);
// add a new chapter
html2pdf.AddChapter(@"Boost page");
// read web page
html2pdf.Run(new Uri(@"http://www.boost.org/libs/libraries.htm"));
// close and finish pdf file.
html2pdf.Close();

What to expect and not expect

Don’t expect too much from this tool, it will not work with complex HTML pages and will give fairly good results with simple HTML pages. Specially, tables are not yet supported.
Reference

1. NTidy, a .NET wrapper around Tidy.
2. HTML Tidy home page.
3. iTextSharp, PDF generation tool.

From: http://www.codeproject.com/csharp/pdfizer.asp

Please download the new code from http://sourceforge.net/projects/pdfizer/

Source code of PDFizer

Sponsored Links


If you use a site to convert PDF to Word you might get what you want, but when converting PDF files you may want more power or choices, and that’s when a big business invests in a PDF server package; for personal use, just finding a good piece of PDF conversion software can be a big
advantage.
Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Wists
  • BlinkList
  • blogmarks
  • blogtercimlap
  • connotea
  • DotNetKicks
  • Fark
  • Fleck
  • Gwar
  • Haohao
  • IndianPad
  • Internetmedia
  • LinkaGoGo
  • MyShare
  • Netscape
  • NewsVine
  • Rec6
  • Reddit
  • Scoopeo
  • Slashdot
  • StumbleUpon
  • Technorati
  • Webride