I am having an issue with navigation. I get a list of rows from an html table. I iterate over the rows and scrape information from them. But there is also a link on the row that I click to go to more information related to the row to scrape. Then I navigate back to the page with the original table. This works for the first row, but for the subsequent rows, it throws an exception.
I look at my row collection after the first time the link inside a row is clicked, and none of them have the correct values like they did before I clicked the link. I believe that there is something going on when I navigate to a different URL that I’m not getting.
My code is below. How do I get this working so I can iterate over the parent table, click the links in each row, navigate to the child table, but still continue iterating over the rows in the parent table?
private List<Document> getResults()
{
var documents = new List<Document>();
//Results
IWebElement docsTable = this.webDriver.FindElements(By.TagName("table"))
.Where(table => table.Text.Contains("Document List"))
.FirstOrDefault();
var validDocRowRegex = new Regex(@"^(\d{3}\s+)");
var docRows = docsTable.FindElements(By.TagName("tr"))
.Where(row =>
//It throws an exception with .FindElement() when there isn't one.
row.FindElements(By.TagName("td")).FirstOrDefault() != null &&
//Yeah, I don't get this one either. I negate the match and so it works??
!validDocRowRegex.IsMatch(
row.FindElement(By.TagName("td")).Text))
.ToList();
foreach (var docRow in docRows)
{
//Todo: find out why this is crashing on some documents.
var cells = docRow.FindElements(By.TagName("td"));
var document = new Document
{
DocID = Convert.ToInt32(cells.First().Text),
PNum = Convert.ToInt32(cells[1].Text),
AuthNum = Convert.ToInt32(cells[2].Text)
};
//Go to history for the current document.
cells.Where(cell =>
cell.FindElements(By.TagName("a")).FirstOrDefault() != null)
.FirstOrDefault().Click();
//Todo: scrape child table.
this.webDriver.Navigate().Back();
}
return documents;
}
UPDATE: (In response to Jim Evans’ answer)
This looks like it’s working correctly.
private List<Document> getResults()
{
var documents = new List<Document>();
IWebElement docRow = null;
int rowIndex = 0;
while((docRow = this.getDocumentRow(rowIndex)) != null)
{
var cells = docRow.FindElements(By.TagName("td"));
var document = new Document
{
DocID = Convert.ToInt32(cells.First().Text),
PNum = Convert.ToInt32(cells[1].Text),
AuthNum = Convert.ToInt32(cells[2].Text)
};
//Go to history for the current document.
cells.Where(cell =>
cell.FindElements(By.TagName("a")).FirstOrDefault() != null)
.FirstOrDefault().Click();
//Todo: scrape child table.
this.webDriver.Navigate().Back();
documents.Add(well);
rowIndex++;
}
return documents;
}
private IWebElement getDocumentRow(int rowIndex)
{
try
{
IWebElement docsTable = this.webDriver.FindElements(By.TagName("table"))
.Where(table => table.Text.Contains("Document List"))
.FirstOrDefault();
var validDocRowRegex = new Regex(@"^(\d{3}\s+)");
var docRow = docsTable.FindElements(By.TagName("tr"))
.Where(row =>
//It throws an exception with .FindElement() when there isn't one.
row.FindElements(By.TagName("td")).FirstOrDefault() != null &&
//Yeah, I don't get this one either. I negate the match and so it works??
!validDocRowRegex.IsMatch(
row.FindElement(By.TagName("td")).Text))
.ElementAt(rowIndex);
return docRow;
}
catch
{
return null;
}
}
Your problem is that once you navigate to a new page (via .Click() in your case), your cached elements are no longer valid. The DOM is reconstructed on every page load, including when you navigate back in the browser history. So even though you’re loading a page you’ve already navigated to, you’re getting a newly constructed DOM, so all references to the previously-constructed DOM are invalid. The solution is to re-find the elements you’re looking for after you navigate back to the previous page.