I’m trying to search a field in a database to extract URLs. Sometimes there will be more than 1 URL in a field and I would like to extract those in to separate variables (or an array).
I know my regex isn’t going to cover all possibilities. As long as I flag on anything that starts with http and ends with a space I’m ok.
The problem I’m having is that my efforts either seem to get only 1 URL per record or they get only 1 the last letter from each URL. I’ve tried a couple different techniques based on solutions other have posted but I haven’t found a solution that works for me.
Sample input line:
Testing http://marko.co http://tester.net Just about anything else you’d like.
Output goal
$var[0] = http://marko.co
$var[1] = http://tester.net
First try:
if ( $status =~ m/http:(\S)+/g ) {
print “$&\n”;
}
Output:
http://marko.co
Second try:
@statusurls = ($status =~ m/http:(\S)+/g);
print “@statusurls\n”;
Output:
o t
I’m new to regex, but since I’m using the same regex for each attempt, I don’t understand why it’s returning such different results.
Thanks for any help you can offer.
I’ve looked at these posts and either didn’t find what I was looking for or didn’t understand how to implement it:
This one seemed the most promising (and it’s where I got the 2nd attempt from, but it didn’t return the whole URL, just the letter: How can I store regex captures in an array in Perl?
This has some great stuff in it. I’m curious if I need to look at the URL as a word since it’s bookended by spaces: Regex Group in Perl: how to capture elements into array from regex group that matches unknown number of/multiple/variable occurrences from a string?
This one offers similar suggestions as the first two. How can I store captures from a Perl regular expression into separate variables?
Solution:
@statusurls = ($status =~ m/(http:\S+)/g);
print “@statusurls\n”;
Thanks!
I think that you need to capture more than just one character. Try this regex instead: