17 Jan 2004
Popular C# wrapper for wkhtmltopdf with simple HTML to PDF API. Supports CSS/JS, custom fonts, page header/footer. Easy deployment (all-in-one DLL). Can be used in ASP.NET MVC/WebForms,.NET Core, Mono. ASP.NET – Convert PDF to TXT (Plain-Text) or HTML in C# with iTextSharp An useful C# code snippet to convert PDF files into TXT plain-text or HTML in C# with iTextSharp, an open-source PDF management library for ASP.NET.
This library converts simple HTML documents to PDF.
Introduction
This article presents a basic HTML to PDF converter: with this library, you can transform simple HTML pages to nice and printable PDF files.
The HTML cleaning is done with NTidy (see [1]), a .NET wrapper for the HTML Tidy library (see [2]). The PDF generation is done with iTextSharp, a PDF generation library (see [3]).
Transformation Pipe
Transforming HTML documents to PDF is a fairly complex task. Hopefully, there exists powerful tools on the web that could help me accomplish this.
Parsing HTML
The first problem to handle was that HTML is usually 'dirty': the structure is usually not XML conformant and trying to parse HTML pages with the
XmlDocument
will usually lead to a failure.To overcome this problem, I had to write a .NET wrapper around HTML Tidy (see [2]). HTML Tidy is a very useful application that takes 'dirty' HTML and returns it cleaned as much as possible. The .NET wrapper exposes a DOM-like class structure so that you can use it much like
XmlDocument
.Hence, with NTidy, we can safely parse HTML document.
Creating PDF
The PDF creation is done by iTextSharp (see [3]), a .NET library hosted on SourceForge, that gives you the tool to create PDF easily. Hence, the PDF creation problem is solved.
Reading, Traversing
With NTidy and iTextSharp on my toolset, I could start to create the generator. The generator works like this: it first reads the input with NTidy, then traverses the DOM tree and generates the PDF fragments with iTextSharp.
Quick Example
The library usage is done through the
HtmlToPdfConverter
class. Creating a PDF file is done through the following steps, as illustrated in the example:- Create a converter,
- Open a new PDF file using the
Open
method, - Add a chapter,
- Feed HTML to the converter,
- If you want another chapter, go to 3.
- When finished, close the PDF file by calling
Close
.
What to expect and not expect
Don't expect too much from this tool, it will not work with complex HTML pages and will give fairly good results with simple HTML pages. Specially, tables are not yet supported.
Reference
- NTidy, a .NET wrapper around Tidy.
- HTML Tidy home page.
- iTextSharp, PDF generation tool.
17 Jan 2004
This library converts simple HTML documents to PDF.
Introduction
This article presents a basic HTML to PDF converter: with this library, you can transform simple HTML pages to nice and printable PDF files.
The HTML cleaning is done with NTidy (see [1]), a .NET wrapper for the HTML Tidy library (see [2]). The PDF generation is done with iTextSharp, a PDF generation library (see [3]).
Transformation Pipe
Transforming HTML documents to PDF is a fairly complex task. Hopefully, there exists powerful tools on the web that could help me accomplish this.
Parsing HTML
The first problem to handle was that HTML is usually 'dirty': the structure is usually not XML conformant and trying to parse HTML pages with the
XmlDocument
will usually lead to a failure.To overcome this problem, I had to write a .NET wrapper around HTML Tidy (see [2]). HTML Tidy is a very useful application that takes 'dirty' HTML and returns it cleaned as much as possible. The .NET wrapper exposes a DOM-like class structure so that you can use it much like
XmlDocument
.Hence, with NTidy, we can safely parse HTML document.
Creating PDF
The PDF creation is done by iTextSharp (see [3]), a .NET library hosted on SourceForge, that gives you the tool to create PDF easily. Hence, the PDF creation problem is solved.
Reading, Traversing
With NTidy and iTextSharp on my toolset, I could start to create the generator. The generator works like this: it first reads the input with NTidy, then traverses the DOM tree and generates the PDF fragments with iTextSharp.
Quick Example
The library usage is done through the
HtmlToPdfConverter
class. Creating a PDF file is done through the following steps, as illustrated in the example:- Create a converter,
- Open a new PDF file using the
Open
method, - Add a chapter,
- Feed HTML to the converter,
- If you want another chapter, go to 3.
- When finished, close the PDF file by calling
Close
.
What to expect and not expect
How To Remove Asp.net
Don't expect too much from this tool, it will not work with complex HTML pages and will give fairly good results with simple HTML pages. Specially, tables are not yet supported.
Reference
Asp Download Pdf
- NTidy, a .NET wrapper around Tidy.
- HTML Tidy home page.
- iTextSharp, PDF generation tool.