I extract some code from a web page (http://www.opensolaris.org/os/community/on/flag-days/all/) like follows,
<tr class='build'> <th colspan='0'>Build 110</th> </tr> <tr class='arccase project flagday'> <td>Feb-25</td> <td></td> <td></td> <td></td> <td> <a href='../pages/2009022501/'>Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br /> cpupm keyword mode extensions - <a href='/os/community/arc/caselog/2008/777/'>PSARC/2008/777</a><br /> CPU Deep Idle Keyword - <a href='/os/community/arc/caselog/2008/663/'>PSARC/2008/663</a><br /> </td> </tr>
and there are some relative url path in it, now I want to search it with regular expression and replace them with absolute url path. Since I know urljoin can do the replace work like that,
>>> urljoin('http://www.opensolaris.org/os/community/on/flag-days/all/', ... '/os/community/arc/caselog/2008/777/') 'http://www.opensolaris.org/os/community/arc/caselog/2008/777/'
Now I want to know that how to search them using regular expressions, and finally tanslate the code to,
<tr class='build'> <th colspan='0'>Build 110</th> </tr> <tr class='arccase project flagday'> <td>Feb-25</td> <td></td> <td></td> <td></td> <td> <a href='http://www.opensolaris.org/os/community/on/flag-days/all//pages/2009022501/'>Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br /> cpupm keyword mode extensions - <a href='http://www.opensolaris.org/os/community/arc/caselog/2008/777/'>PSARC/2008/777</a><br /> CPU Deep Idle Keyword - <a href='http://www.opensolaris.org/os/community/arc/caselog/2008/663/'>PSARC/2008/663</a><br /> </td> </tr>
My knowledge of regular expressions is so poor that I want to know how to do that. Thanks
I have finished the work using Beautiful Soup, haha~ Thx for everybody!
First, I’d recommend using a HTML parser, such as BeautifulSoup. HTML is not a regular language, and thus can’t be parsed fully by regular expressions alone. Parts of HTML can be parsed though.
If you don’t want to use a full HTML parser, you could use something like this to approximate the work:
Example: