I don’t get it. I created this regular expression:
<span class="copy[Green|Red].*>[\s]*(.*)[\s]*<\/span>
to match certain parts of HTML code (a part between spans). For instance the following:
<span class="copyGreen">0.12</span> <span class="copyRed"> 0.12 </span>
Now, this works beautifully with RegexBuddy and others, but with Boost::regex
I have an issue. It doesn’t match up.
EDIT: To be more precise, I want to capture the number between the spans. Before and after
the number, there can be white spaces as well (\n, \r, etc.).
Here’s the code I’ve been done:
try {
const boost::regex e("<span class=\"copy[Green|Red].*>[\\s]*(.*)[\\s]*<\\/span>");
boost::smatch matches;
std::string html("<span class=\"copyGreen\"> 0.12 </span>");
if (boost::regex_match(html, matches, e)) {
// Works... (not).
} else {
throw std::runtime_error("Couldn't match the regex against HTML-source!");
}
} catch (boost::regex_error& e) {
std::cout << e.what() << std::endl;
}
What am I doing wrong here? Thanks in advance!
EDIT:
It seems, that the correct expression would be
boost::regex("<span class=\"copy(?:Green|Red)[^>]*>\\s*(.*?)\\s*<\\/span>"); // Thanks chaos!.
This actually matches up with Boost. However, I needed to enable boost::match_extra
in order to get all the captures I needed. This was done by defining
BOOST_REGEX_MATCH_EXTRA
in boost\regex\user.hpp
Thank you once again.
For one thing, this:
doesn’t do what you think it does. You want:
[Green|Red]is a character class made up of the lettersGRred|, not a way of alternating between matches. The way you’ve written it, it will match exactly one of those characters followed by any number of other characters.This:
is redundant and maybe hazardous (depending on interpretation it could be what’s actually making your match not work). It can be just
In order for your second
\sto work, the capturing expression probably needs to beI also recommend making your first
.*into[^>]*, to avoid the problem you’ll get if you ever apply this to actual HTML documents, where it will suck in arbitrary amounts of HTML.