I’m retrieving HTML of many webpages (saved earlier) from SQL Server. My purpose is to modify an img’s src attribute. There is only one img tag in the HTML and it’s source is like so:
...
<td colspan="3" align="center">
<img src="/crossword/13cnum1.gif" height="360" width="360" border="1"><br></td>
...
I need to change the /crossword/13cnum1.gif to http://www.nostrotech.com/crossword/13cnum1.gif
Code:
private void ReplaceTest() {
String currentCode = string.Empty;
Cursor saveCursor = Cursor.Current;
try {
Cursor.Current = Cursors.WaitCursor;
foreach (WebData oneWebData in DataContext.DbContext.WebDatas.OrderBy(order => order.PuzzleDate)) {
if (oneWebData.Status == "Done" ) {
currentCode = oneWebData.Code;
#region Setup Agility
HtmlAgilityPack.HtmlDocument AgilityHtmlDocument = new HtmlAgilityPack.HtmlDocument {
OptionFixNestedTags = true
};
AgilityHtmlDocument.LoadHtml(oneWebData.PageData);
#endregion
#region Image and URL
var imageOnPage = from imgTags in AgilityHtmlDocument.DocumentNode.Descendants()
where imgTags.Name == "img" &&
imgTags.Attributes["height"] != null &&
imgTags.Attributes["width"] != null
select new {
Url = imgTags.Attributes["src"].Value,
tag = imgTags.Attributes["src"],
Text = imgTags.InnerText
};
if (imageOnPage == null) {
continue;
}
imageOnPage.FirstOrDefault().tag.Value = "http://www.nostrotech.com" + imageOnPage.FirstOrDefault().Url;
#endregion
}
}
}
catch (Exception ex) {
XtraMessageBox.Show(String.Format("Exception: " + currentCode + "!{0}Message: {1}{0}{0}Details:{0}{2}", Environment.NewLine, ex.Message, ex.StackTrace), Text, MessageBoxButtons.OK, MessageBoxIcon.Error);
}
finally {
Cursor.Current = saveCursor;
}
}
I need help as the markup is NOT updated this way and I need to store the modified markup back to the DB. Thanks.
XPATH is much more consise than all this XLinq jargon, IMHO…
Here is how to do it:
This code searches for
imgtags that havesrc,heightandwidthattributes. Then, it replaces thesrcattribute value.