I am a beginner in web crawling. I am trying to crawl a page, for example, this page:
http://shopping.yahoo.com/search;_ylt=AkzLiLhD9_ulIJy.SYsw9T0bFt0A?p=video&did=0
I need to extract the search results such as: Amazon.com or antonline.com. Can any body help me in naming some techniques, tools, sw that can help me achieve this ?
EDIT: I have to work with Java.
Basically the idea is to inspect page in browser devtools (Chrome or Firebug). Try to find special id’s or classes. On your page this is
<ul class='hproducts'>that has a list of<li class='hproduct'>Use that!Then you make a call and get response and parse it. (Google for DOM, SAX, XPath…) This is very different between languages and libs. For example on Java we have JSoup library that can fetch html (it is a little different to xml in this case, huh) and parse it in convenient way.
Or better google for their API 😉