I have a question about parsering online html page.
when I open html source from a web browser, I can see the data in there.
But when I read this html page from java. I can not reach the data.
after I saved this html file, and read it as local file,
then I am able to read the data from there.
I take eBay.com.au as a example.
//——–Example———
target web page
URL:http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40
Here is my Java code
import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.HtmlCleaner;
import java.net.URL;
public class HtmlCleanerTest
{
public static void main(String[] args) throws Exception
{
CleanerProperties props = new CleanerProperties();
URL myURL = new URL("http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40");
TagNode tagNode = new HtmlCleaner(props).clean(myURL);
Object[] myNodes = tagNode.getElementsByAttValue("class", "s1", true, true);
for(Object oNote : myNodes)
{
TagNode n = (TagNode) oNote;
System.out.println(n.getText());
}
}
}
I can get each product price by using this code, but I expected to get sellers location info by using this. How do I do that?
//—RE-edited ——————————-
I have found a way to solv my question,
I posted it here for someone like me has same problem.
I am not saying it is best solution for this, but I hope it may give you a thought.
here it is.
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import java.util.List;;
public class Test{
public static void main(String[] args)
{
WebDriver driver = new FirefoxDriver();
driver.get("http://www.ebay.com.au/sch/i.html?scp=ce0&_sacat=0&_from=R40&_nkw=imac+27&_pppn=r1&_rdc=1");
driver.findElement(By.id("e1-14")).click();
driver.findElement(By.name("Stores")).click();
driver.findElement(By.id("e1-3")).click();
driver.quit();
}
}
/————–
——END——-
————–/
I came to here with one question, what if HTML File come with Javascript, How do we grab data from it with Javascript complete executed. I guess I am not very good questioner.
Probably the page has some JavaScript code that is executed by the browser and loads more data to the page, after the HTML has been loaded. Reading only the HTML with Java does not execute the JavaScript, hence additional data is not visible in the stream.
Edit:
A library like HtmlUnit may help in solving the common problem of loading Ajaxified Html pages to a certain degree: http://htmlunit.sourceforge.net/