I thought I would write some quick code to download the number of “fans” a Facebook page has.
For some reason, despite a fair number of iterations I’ve tried, I can’t get the following code to pick out the number of fans in the HTML. None of the other solutions I found on the web correctly match the regex in this case either. Surely it is possible to have some wildcard between the two matching bits?
The text I’d like to match against is “6 of X fans“, where X is an arbitrary number of fans a page has – I would like to get this number.
I was thinking of polling this data intermittently and writing to a file but I haven’t gotten around to that yet. I’m also wondering if this is headed in the right direction, as the code seems pretty clunky. 🙂
import urllib
import re
fbhandle = urllib.urlopen('http://www.facebook.com/Microsoft')
pattern = "6 of(.*)fans" #this wild card doesnt appear to work?
compiled = re.compile(pattern)
for lines in fbhandle.readlines():
ms = compiled.match(lines)
print ms #debugging
if ms: break
#ms.group()
print ms
fbhandle.close()
You needed to use
re.search()instead. Usingre.match()tries to match the pattern against the whole document, but really you’re just trying to match a piece inside the document. The code above prints:79,110. Of course, this will probably be a different number by the time it gets run by someone else.