hi guys i’ve try to urls from google but it’s return 0 urls !
this is my code what the wrong with it ?
import string, sys, time, urllib2, cookielib, re, random, threading, socket, os, time
def Search(go_inurl,maxc):
header = ['Mozilla/4.0 (compatible; MSIE 5.0; SunOS 5.10 sun4u; X11)',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2pre) Gecko/20100207 Ubuntu/9.04 (jaunty) Namoroka/3.6.2pre',
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Avant Browser;',
'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)',
'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)',
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6)',
'Microsoft Internet Explorer/4.0b1 (Windows 95)',
'Opera/8.00 (Windows NT 5.1; U; en)',
'amaya/9.51 libwww/5.4.0',
'Mozilla/4.0 (compatible; MSIE 5.0; AOL 4.0; Windows 95; c_athome)',
'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; ZoomSpider.net bot; .NET CLR 1.1.4322)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot 1.0 qihoobot@qihoo.net)',
'Mozilla/4.0 (compatible; MSIE 5.0; Windows ME) Opera 5.11 [en]']
gnum=100
uRLS = []
counter = 0
while counter < int(maxc):
jar = cookielib.FileCookieJar("cookies")
query = 'q='+go_inurl
results_web = 'http://www.google.com/cse?'+'cx=011507635586417398641%3Aighy9va8vxw&ie=UTF-8&'+'&'+query+'&num='+str(gnum)+'&hl=en&lr=&ie=UTF-8&start=' + repr(counter) + '&sa=N'
request_web = urllib2.Request(results_web)
agent = random.choice(header)
request_web.add_header('User-Agent', agent)
opener_web = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
text = opener_web.open(request_web).read()
strreg = re.compile('(?<=href=")(.*?)(?=")')
names = strreg.findall(text)
counter += 100
for name in names:
if name not in uRLS:
if re.search(r'\(', name) or re.search("<", name) or re.search("\A/", name) or re.search("\A(http://)\d", name):
pass
elif re.search("google", name) or re.search("youtube", name) or re.search(".gov", name) or re.search("%", name):
pass
else:
uRLS.append(name)
tmpList = []; finalList = []
for entry in uRLS:
try:
t2host = entry.split("/",3)
domain = t2host[2]
if domain not in tmpList and "=" in entry:
finalList.append(entry)
tmpList.append(domain)
except:
pass
print "[+] URLS (sorted) :", len(finalList)
return finalList
also i’ve done a lot of editing and still nothing happen ! please show me what is my mistake .. Thanks guys 🙂
Two issues I see with this. Firstly, you are using a custom Google search that (apparently) seems to return only results from google.com. This combined with a regex that looks for the occurrence of “google” in the url (
re.search("google", name)), and when found does not add it to the list of urls will cause the list of urls to always remain empty for this custom search.Additionally and more importantly, your logic is incorrect. With fixed formatting, you currently do this:
(Note that the
elifandelsemight be indented once to much, but still, the problem will persist.)Because you check if
nameis not inuRLS,namewill never get added to that list because the adding logic is in yourelsepath.To fix it, remove the
else, decrease the indentation of theappendstatement, and replace thepassstatements withcontinue.