I have a bunch of strings that look, for example, like this:
<option value="Spain">Spain</option>
And I want to extract the name of the country from inside.
The easiest way I could think of to do this in Ruby was to use a regular expression of this form:
country = line.match(/>(.+)</)
However, this returns >Spain<. So I did this:
line.match(/>(.+)</).to_s.gsub!(/<|>/,"")
Works well enough, but I’d be surprised if there’s not a more elegant way to do this? It seems like using a regular expression to declare how to find the thing you want, without actually wanting the enclosing strings that were used to match it to be part of the data that gets returned.
Is there a conventional approach to this problem?
The right way to deal with that string is to use an HTML parser, for example:
And if you have several such strings, paste them together and use
search:But if you must use a regex, then:
Keep in mind that
matchactually returns a MatchData object andMatchData#to_s:But you can access the captured groups using
MatchData#[]. And if you don’t like counting, you could use a named capture group as well: