I want to match multiline comments that contain a specific word, let’s say findthis. The first pattern that comes to mind is \/\*.*?findthis.*?\*\/ (using DOTALL). The problem with this pattern however is that a string like this:
/* this is a comment */
this is some text
/* this is a findthis comment */
will match the whole text. Basically, on a bigger file, the first match would contain everything from the first comment to the first comment containing findthis. How can I prevent this?
Well, you could change the regex to something like
\/\*([^*]|\*+[^/*])*findthis([^*]|\*+[^/*])*\*+\/but…To get this exactly right, you would have to fully tokenize the source code. Otherwise your regex will be fooled by comment-like content inside strings (among other bizarre corner cases).
(Explanation of crazy regex:
([^*]|\*+[^/*])matches a little bit of the inside of a comment, but never matches all or part of*/.)