I’ve always user beautiful soup and lxml.html to parse html in python but now I am faced with writing a script that will run on the standard library alone. What’s the next best module/technique?
I’m prepared to accept that it will be comparatively poor, even that I’ll lose the ability to use CSS selectors (weep!) the problem is I need it to run on any old webhost and they only ever have the standard library.
Alternatively could I install the lxml and lxml.html module by hand somehow? i.e. copy /usr/share/pyshared/lxml folder to my server and use sys.path.insert to make my script see it? That’s ugly but not as long as rewriting my code to parse html without the two defacto standard libs!
I haven’t tried yet but I doubt the shells you get on a shared hosting server will permit me to install a python module in the more conventional way: with “python setup.py install” or pip but if you know otherwise please let me know.
Cheers,
Roger – London
Try virtualenv, you can install the package wherever you like.