I want to write a regular expression to parse this webpage(view-source:http://www.imdb.com/search/title?title=spiderman&title_type=feature). Basically I want to extract all the sections between <tr class=".+"> and </tr>. This webpage is a list of movies from imdb(http://www.imdb.com/search/title?title=spiderman&title_type=feature) and each section here indicates a movie. I tried the regular expression
<tr class=".+">(.+\n)+</tr>
However, it doesn’t work. Also, I’m not allowed to use DOM. Does anyone have any suggestions? Thanks!
I strongly suggest you use a proper parser. But here is the regex for your case.