I am looking to write something that seems like it should be easy enough, but for whatever reason I’m having a tough time getting my head around it.
I am looking to write a python function that, when passed a string, will pass that string back with HTML encoding around URLs.
unencoded_string = "This is a link - http://google.com"
def encode_string_with_links(unencoded_string):
# some sort of regex magic occurs
return encoded_string
print encoded_string
'This is a link - <a href="http://google.com">http://google.com</a>'
Thank you!
The “regex magic” you need is just
sub(which does a substitution):URL_REGEXcould be something like:This is a pretty loose regex for URLs: it allows mailto, http and ftp schemes, and after that pretty much just keeps going until it runs into an “unsafe” character (except percent, which you want to allow for escapes). You could make it more strict if you need to. For example, you could require that percents are followed by a valid hex escape, or only allow one pound sign (for the fragment) or enforce the order between query parameters and fragments. This should be enough to get you started, though.