Sorry, I know this is probably a duplicate but having searched for ‘python regular expression match between’ I haven’t found anything that answers my question!
The document (which to make clear, is a long HTML page) I’m searching has a whole bunch of strings in it (inside a JavaScript function) that look like this:
link: '/Hidden/SidebySideGreen/dei1=1204970159862'};
link: '/Hidden/SidebySideYellow/dei1=1204970159862'};
I want to extract the links (i.e. everything between quotes within these strings) – e.g. /Hidden/SidebySideYellow/dei1=1204970159862
To get the links, I know I need to start with:
re.matchall(regexp, doc_sting)
But what should regexp be?
The answer to your question depends on how the rest of the string may look like. If they are all like this
link: '<URL>'};then you can do it very simple using simple string manipulation:(If you just have one string with multiple lines by that, you can just split the string into lines.)
If it is a bit more complex though, using regular expressions are fine. One example that just looks for the url inside of the quotes would be:
Depending on how the whole string looks, you might have to include the
link:as well: