I’m trying to crate a PDF out of a HTML page. The CMS I’m using is EPiServer.
This is my code so far:
protected void Button1_Click(object sender, EventArgs e)
{
naaflib.pdfDocument(CurrentPage);
}
public static void pdfDocument(PageData pd)
{
//Extract data from Page (pd).
string intro = pd["MainIntro"].ToString(); // Attribute
string mainBody = pd["MainBody"].ToString(); // Attribute
// makae ready HttpContext
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
// Create PDF document
Document pdfDocument = new Document(PageSize.A4, 80, 50, 30, 65);
//PdfWriter pw = PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);
PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);
pdfDocument.Open();
pdfDocument.Add(new Paragraph(pd.PageName));
pdfDocument.Add(new Paragraph(intro));
pdfDocument.Add(new Paragraph(mainBody));
pdfDocument.Close();
HttpContext.Current.Response.End();
}
This outputs the content of the article name, intro-text and main body.
But it does not pars HTML which is in the article text and there is no layout.
I’ve tried having a look at http://itextsharp.sourceforge.net/tutorial/index.html without becomming any wiser.
Any pointers to the right direction is greatly appreciated 🙂
For later versions of iTextSharp:
Using iTextSharp you can use the
iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList()method to create a PDF from HTML.ParseToList()takes aTextReader(an abstract class) for its HTML source, which means you can use aStringReaderorStreamReader(both of which use TextReader as a base type). I used aStringReaderand was able to generate PDFs from simple mark up. I tried to use the HTML returned from a webpage and got errors on all but the simplist pages. Even the simplist webpage I retrieved (http://black.ea.com/) was rendering the content of the page’s ‘head’ tag onto the PDF, so I think theHTMLWorker.ParseToList()method is picky about the formatting of the HTML it parses.Anyway, if you want to try here’s the test code I used:
I couldn’t find any documentation on which HTML constructs
HTMLWorker.ParseToList()supports; if you do please post it here. I’m sure a lot of people would be interested.For older versions of iTextSharp:
You can use the
iTextSharp.text.html.HtmlParser.Parsemethod to create a PDF based on html.Here’s a snippet demonstrating this:
The one (major for me) problem is the HTML must be strictly XHTML compliant.
Good luck!