I’m trying to go through a web pages source code, add the <img src="http://www.dot.com/image.jpg" to an HtmlElementCollection. Then I’m attempting to cycle through each element in the element collection with a foreach loop and download the images through the url.
Here’s what I have so far. My problem right now is nothing is downloading, and I don’t think my elements are being added properly by tag name. If they are I can’t seem to reference them for the download.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public void button1_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
string sourceCode = WorkerClass.ScreenScrape(url);
StreamWriter sw = new StreamWriter("sourceScraped.html");
sw.Write(sourceCode);
}
private void button2_Click(object sender, EventArgs e)
{
string url = urlTextBox.Text;
WebBrowser browser = new WebBrowser();
browser.Navigate(url);
HtmlElementCollection collection;
List<HtmlElement> imgListString = new List<HtmlElement>();
if (browser != null)
{
if (browser.Document != null)
{
collection = browser.Document.GetElementsByTagName("img");
if (collection != null)
{
foreach (HtmlElement element in collection)
{
WebClient wClient = new WebClient();
string urlDownload = element.FirstChild.GetAttribute("src");
wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
}
}
}
}
}
}
}
Ones you call navigate, you assume document is ready to traverse and check for images. but practically it take some time to load. You need to wait until Document loading Completed.
Add event
DocumentCompletedto your browser objectimplement it as