I’m trying to take a long sting and extract all the urls it contains.
page.findall(r"http://.+")
is what I have, but that doesn’t result in what I want. The urls are all wrapped in double quotes, so how can I tell regular expressions to stop matching when it reaches a “?
There are very complex url-parsing regexes out there, but if you want to stop at a
", just use[^\"]+for the url part.Or switch to a single-quoted string and remove the
\.Also, if you have
httpsmixed in, it will break, so you might want to just go withBut now we’re getting into url-parsing regexes.