I have the most basic java code to do a http request and it works fine. I request data and a ton of html comes back. I want to retrieve all the url’s from that page and list them. For a simple first test i made it look like this:
int b = line.indexOf("http://",lastE);
int e = line.indexOf("\"", b);
This works but as you can imagine it’s horrible and only works in 80% of the cases. The only alternative i could come up with myself sounded slow and stupid. So my question is pretty mutch do i go from
String html
to
List<Url>
?
Pattern p = Pattern.compile("http://[\w^\"]++"); Matcher m = p.matcher(yourFetchedHtmlString); while (m.find()) { nextUrl=m.group();//Do whatever you want with it }You may also have to tweak the regexp, as i have just written it without testing. This should be a very fast way to fetch urls.