Possible Duplicate:
Why html source is different when I opened it from web browser and read it in Java?
I have a question about parsing online html page. When I open html source from a web browser, I can see the data in there.
But when I read this html page from java. I can not reach the data.
After I saved this html file, and read it as local file, then I am able to read the data from there.
I take eBay.com.au as a example.
//——–Example———
target web page
URL:http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40
Here is my Java code
import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.HtmlCleaner;
import java.net.URL;
public class HtmlCleanerTest
{
public static void main(String[] args) throws Exception
{
CleanerProperties props = new CleanerProperties();
URL myURL = new URL("http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40");
TagNode tagNode = new HtmlCleaner(props).clean(myURL);
Object[] myNodes = tagNode.getElementsByAttValue("class", "s1", true, true);
for(Object oNote : myNodes)
{
TagNode n = (TagNode) oNote;
System.out.println(n.getText());
}
}
}
I can get each product price by using this code, but I expected to get sellers location info by using this. How do I do that?
If the data shown in the website is javascript generated, you will have no way to get the data unless you implements javascript functionality in your java code.
Second possibilities is the web server determine the data from the user-agent or the capabilities of the browser/fetcher.