A google search gives me the following first result on HTML:
<h3 class="r"><a href="https://rads.stackoverflow.com/amzn/click/com/0470284889" rel="nofollow noreferrer" class="l vst" onmousedown="return rwt(this,'','','','1','AFQjCNEv1W9YC2jcSKYdEo2kNqBMJ-Utmg','k89K9hF4cVNpxQYHtEKiUQ','0CCoQFjAA',null,event)"><em>Quantitative Trading</em>: <em>How to Build Your Own Algorithmic</em> <b>...</b> - Amazon</a></h3>
I would like to extract the link http://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889 from this, but when I use beautiful soup to extract the information, I obtain
soup.find("h3").find("a").get("href")
I obtain the following string instead:
I know that the link is in there and I could parse it by deleting the /url?q= and everything after the & symbol, but I was wondering if there was a cleaner solution.
Thanks!
You can use a combination of urlparse.urlparse and urlparse.parse_qs, e.g