Assume that I have a var std::string sourceCode; where I have loaded a cpp source file. Now I want to remove all comments with the included regex classes from tr1 (now they are fully included as I use the Microsoft compiler) – single-line is easy but multi-line not. It is not about just replacing a comment with a space etc. it’s about to keep the correct number of lines. Assume we remove a comment which is 5 lines long, this space should then be filled with 5 newlines so that I am able to backtrack code and compute with the correct line numbers.
My code so far:
std::regex singleLinedCommentReg("//.*");
sourceCode = std::regex_replace(sourceCode, singleLinedCommentReg, std::string(""));
std::regex multiLinedCommentReg("(/\\*([^*]|[\r\n]|(\\*+([^*/]|[\r\n])))*\\*+/)");
std::for_each(
std::sregex_iterator(sourceCode.begin(), sourceCode.end(), multiLinedCommentReg),
std::sregex_iterator(),
[&](const std::match_results<std::string::const_iterator>& match) -> bool {
// TODO: Replace the current match with an appropriate number of newlines.
return true;
}
);
Can anyone give me some advice on that?
EDIT #1
I do NOT want to provoke comments about the discussion whether it makes sense to use RegEx for this kind of doing! Please just assume the input is clean and as expected.
Your approach using regex is way off and too complicated. You are trying to use a regular language (regex) to parse a situation that is at least as complex as a context-free grammar. If you split things up and do part of the processing in C++ you’ll get it done but it’ll look messy.
If your goal is to write a function that strips all of the comments out without losing the new line characters I suggest that you generate a parse using one of the many parsing tools available.
This took less than 5 minutes to create and is functionally what you are looking for. You can modify this to your hearts content. It will generate a lexer with flex 2.5.4 or flex 2.5.35
Addendum:
The above is a fully functional program. You can generate the .c using:
and you can compile it using
Now something like
will generate the new source file.