I’m trying to complete an assignment where I am supposed to write a Ruby regular expression to capture items between html tags but I’m really stuck. I’ve searched everywhere but I can only find advice about using html parsers and other programs that I don’t think we are allowed to use because we have only learned regular expressions so far.
The example text is:
<span id="animal_display">
<a href="/b/bird">Bird</a>
<a href="/c/cat">Cat</a>
<a href="/c/dog">Dog</a>
</span>
I’m trying to capture Bird Cat Dog
Using this regular expression, I am able to get the first occurrence:
/<span id="animal_display">.*?<[^>]+>(.*?)<\/[^>]+>.*<\/span>/m
I can get all three with this, but I want to be able to use the regular expression on lists that might have more than three items:
/<span id="animal_display">\s*<[^>]+>\s*(.*?)<\/a>.\s*<[^>]+>\s*(.*?)<\/a>.\s*<[^>]+>\s*(.*?)<\/a>.<\/span>/
Is there a more generalized regular expression that could work on an unspecified number of items? Any advice would be greatly appreciated.
This isn’t a complete answer, but sometimes a hairy capturing regex can be simplified by tackling the problem from the other direction — using
split: