I’ve read a few questions on here re parsing HTML with regex, and I understand that this is, on the whole, a terrible idea.
Having said this, I have a very specific problem that I think Regex might be the answer to. I’ve been fumbling around trying to work out the answer but I’m new (today) to Regex, and I was hoping some kind hearted person may be able to help me out.
I have an array of strings that always follow the format
STUFF HERE<a href="somewhere" title="something" target="_blank">name of thing</a>STUFF HERE
What I’m hoping to achieve is to be left with just the ‘somewhere’ and the ‘name of thing, so that I can output just <a href="somewhere">name of thing</a>.
The array of strings comes from an RSS feed of links on my Facebook profile, if you happen to be interested.
Many, many thanks for any help.
Jack
The parenthetical clauses isolate portions of the match for the $matches array. If the pattern matches the string at all, then $matches[1] would contain your href and $matches[2] would contain your link text.
Inside the parenthesis, I’m defining the meat of those segments you’re interested with exclusion characters. The first one is [^\”]+, which is one-or-more of any character except double quote. The latter is [^<]+, which is one or more of any character except less than. This ensures that, if the markup is consistently in the format you provided, then you have well-defined boundaries on either side of the portions you’re interested in.