I’m writing a web scraping program in C#. So far, I have been able to log in to website, save cookie, and return source code of another page. From this source code, I get a link that takes me to a pdf, but the page doesn’t end with .pdf extension. In the browser, this page shows the pdf image and there are controls in the browser including a save button.
I believe the pdf page was created with ColdFusion as it has .cfm, CFID and CFTOKEN in the URL.
How do I save this pdf file programmatically?
Two answers have suggested I save the binary stream to pdf. How do I get the binary data in the first place? I have tried the following:
byte[] result;
byte[] buffer = new byte[4096];
WebRequest wr = WebRequest.Create(billURL);
using (WebResponse response = wr.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
using (MemoryStream memoryStream = new MemoryStream())
{
int count = 0;
do
{
count = responseStream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, count);
} while (count != 0);
result = memoryStream.ToArray();
}
}
}
Do I then want to save result as a pdf, or am I doing something wrong there?
The common method in CF for streaming a PDF to the browser is using this method:
Use a C# WebRequest to get the URL of the PDf. Then check the response header for a ‘Content-Type of ‘application/pdf’. If so, save the binary stream to a PDF file on disk.