I’m trying to parse the links from google search results and end up with

Question

0

Asked: June 17, 20262026-06-17T20:38:04+00:00 2026-06-17T20:38:04+00:00

I’m trying to parse the links from google search results and end up with

0

I’m trying to parse the links from google search results and end up with weird output.

import mechanize, re, lxml.html
from lxml.html import parse
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)     Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.set_handle_robots(False)
url = 'https://www.google.com/search?q=test&gl=US'

response = br.open(url)
html = response.read().lower()

doc = lxml.html.document_fromstring(html)

for t in doc.xpath("//h3[@class='r']/a"):
    print t.get('href')

which results in the following output:

Any help would be great,
Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T20:38:05+00:00

It’s not exactly clear what you’re trying to achieve here, because you’re getting exactly what you’re asking for there.

<h3 class="r">
  <a href="/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cad=rja&amp;ved=0CDUQFjAA&amp;url=http%3A%2F%2Fwww.test.com%2F&amp;ei=bdMEUYXiBefS2AXL5oGoBQ&amp;usg=AFQjCNH21KLjC0CBkjon2DwD_CZ0HApLMw&amp;sig2=KeRdw0_WAGc2Zrz1jI49wQ&amp;bvm=bv.41524429,d.b2I" 
  class="l" 
  onmousedown="return rwt(this,'','','','1','AFQjCNH21KLjC0CBkjon2DwD_CZ0HApLMw','KeRdw0_WAGc2Zrz1jI49wQ','0CDUQFjAA','','',event)">
    <em>Test</em>.com
  </a>
</h3>

You’re getting the href attribute of the inner a tag, which comes out to:

"/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cad=rja&amp;ved=0CDUQFjAA&amp;url=http%3A%2F%2Fwww.test.com%2F&amp;ei=bdMEUYXiBefS2AXL5oGoBQ&amp;usg=AFQjCNH21KLjC0CBkjon2DwD_CZ0HApLMw&amp;sig2=KeRdw0_WAGc2Zrz1jI49wQ&amp;bvm=bv.41524429,d.b2I"

But more likely you’re looking for the link text and the link link. The URL that you’ll be sent to, without the Google special url stuff is in the cite element, and the link text is in the a element you’ve already found.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse the links from google search results and end up with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply