I need to extract the meta keywords from a web page using Python. I was thinking that this could be done using urllib or urllib2, but I’m not sure. Anyone have any ideas?
I am using Python 2.6 on Windows XP
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
lxml is faster than BeautifulSoup (I think) and has much better functionality, while remaining relatively easy to use. Example:
Edit: another example.
BTW: XPath is worth knowing.
Another edit:
Alternatively, you can just use regexp:
…but I find it less readable and more error prone (but involves only standard module and still fits on one line).