I’m looking for Python code that removes C and C++ comments from a string. (Assume the string contains an entire C source file.)
I realize that I could .match() substrings with a Regex, but that doesn’t solve nesting /*, or having a // inside a /* */.
Ideally, I would prefer a non-naive implementation that properly handles awkward cases.
I don’t know if you’re familiar with
sed, the UNIX-based (but Windows-available) text parsing program, but I’ve found a sed script here which will remove C/C++ comments from a file. It’s very smart; for example, it will ignore ‘//’ and ‘/*’ if found in a string declaration, etc. From within Python, it can be used using the following code:In this program,
source_codeis the variable holding the C/C++ source code, and eventuallystripped_codewill hold C/C++ code with the comments removed. Of course, if you have the file on disk, you could have theinputandoutputvariables be file handles pointing to those files (inputin read-mode,outputin write-mode).remccoms3.sedis the file from the above link, and it should be saved in a readable location on disk.sedis also available on Windows, and comes installed by default on most GNU/Linux distros and Mac OS X.This will probably be better than a pure Python solution; no need to reinvent the wheel.