By URL remapping, I mean changing all the “href”s and “src”s and “action”s and … inside an actual HTML doc.
Are there any python libraries to do this type of URL remapping?
On a python web server app (based on tornado) I want to be able to modify the HTML code that I server based on some conditions.
Imagine I read this HTMLs off disk but I need to replace all the links and … to point to this subdomain/domain and path or that one.
Lets say I dont want to use templates to rewrite all the HTML that I have on disk (to put tags inside and replace the tags on run time) also lets for the sake of simplicity imagine I have no external links (like I never link to google.com [that requires conditional remapping]).
As far as i know, there is no such library, but you can use some html parsing library like lxml or BeautifulSoup together with
urlparsestandard Python moule. I prefer usinglxmland XPath.For example, we have saved StackOverflow page as
doc.html, and we want to do something with nodes which containshref,src,actions:In this example, I’m only converting relative
href‘s started with ‘/’ to absolute usingurlparse.urljoin, and not all elements from XPath result used. But you can customize it for your needs.