I’m having problems using regular expression to match http links. I have a pattern that i would like to extract from a websites source code. The source code has 200+ lines with lots of HTML gibberish like </html><body... useless links useless images'
The http links that I need fall under this pattern:
<a href"http:www.google.com/....1,1">
<a href"http:www.google.com/....2,2">
<a href"http:www.google.com/....3,3">
I just want to get the http links, and the unique pattern to them is the ending. Please help, I’ve been stuck for hours experimenting with gusb, regxpr and grep.
Regular expressions are difficult to match to a generic URL (URL Matching), however if you are always looking to match that exact pattern you can try this
This will search for http:www.google.com followed by anything and ending with the same two numbers on each side of the comma, which is what it appears you want from the pattern you displayed.