I’m parsing a source code file, and I want to remove all line comments (i.e. starting with “//”) and multi-line comments (i.e. /…./). However, if the multi-line comment has at least one line-break in it (\n), I want the output to have exactly one line break instead.
For example, the code:
qwe /* 123
456
789 */ asd
should turn exactly into:
qwe
asd
and not “qweasd” or:
qwe
asd
What would be the best way to do so?
Thanks
EDIT:
Example code for testing:
comments_test = "hello // comment\n"+\
"line 2 /* a comment */\n"+\
"line 3 /* a comment*/ /*comment*/\n"+\
"line 4 /* a comment\n"+\
"continuation of a comment*/ line 5\n"+\
"/* comment */line 6\n"+\
"line 7 /*********\n"+\
"********************\n"+\
"**************/\n"+\
"line ?? /*********\n"+\
"********************\n"+\
"********************\n"+\
"********************\n"+\
"********************\n"+\
"**************/\n"+\
"line ??"
Expected results:
hello
line 2
line 3
line 4
line 5
line 6
line 7
line ??
line ??
(^)?will match if the comment starts at the beginning of a line, as long as theMULTILINE-flag is used.[^\S\n]will match any whitespace character except newline. We don’t want to match line breaks if the comment starts on it’s own line./\*(.*?)\*/will match a multi-line comment and capture the content. Lazy matching, so we don’t match two or more comments.DOTALL-flag makes.match newlines.//[^\n]will match a single-line comment. Can’t use.because of theDOTALL-flag.($)?will match if the comment stops at the end of a line, as long as theMULTILINE-flag is used.Examples:
Edits: