I’m building a web app that processes the text in a web page, adds links to certain entities, then re-displays the page exactly as it was, but with some links added. My server-side code is in Perl and Python, and I’m currently using HTML::Parser to extract the text from a page. I can clean the markup, extract, and process the text without issue, but I want to display the original page exactly as it was, only with some links added to previously unlinked text.
I’m hoping to find out the best way to redisplay the exact same page with links added to certain words or phrases in the text. All of the original markup should be preserved exactly as it was before the text was extracted.
I’ve searched thoroughly, but I cannot find a precise solution to this issue. Any help would be greatly appreciated.
I do know that Python has a module for opening webpages, called urllib:
you could also save a new html file with python like this:
In between you could modify the html source. Keep in mind that the webpages will look silly if you don’t figure out how to save the files the pages are using. Hope this helps.