I am trying to create a file with all function/enum/struct/etc names from a source file. For that, I am at the moment trying to use sed to accomplish something like this:
(original file)
function add1 (int i) {
return i+1;
}
(output of sed)
function add1 (int i) {
}
In other words, I want to remove the actual contents of the function’s body. I could so far not get it to work. Any suggestions?
EDIT: I tried something like this, with no success (for now I am trying to only make the lines on the function’s body blank):
sed '/{/,/}/ s/.*//'
Instead of
sed, you could always useawkin per-character field mode (FS=""):The above will skip the contents of any paired curly braces, i.e. function and structure bodies, array initializations, and so on, and output the result to standard output. You can specify one or more files. (If you don’t specify any files, it’ll expect input from standard input.)
As it is now, it will get confused about braces within quotes or comments. That could be fixed in the same way, but it does get quite complicated fast. This is just a hack to get you most of the way.
I added the semicolons (
;) so you can just stuff everything in the above snippet on one long command line.The logic of the script is very simple. It uses the empty field separator (
FS), so that every character in input will be their own field. TheBEGINrule is run once before any input is processed, and sets this up. For developer information, I also initialized = 0although it is not necessary for awk since it assumes uninitialized variables to be empty or zero as appropriate. It will track the current brace depth for each input character.The second braced expression will be executed once per every record. Since I set
RS = "\n", each line is a separate expression. Thus, it will be executed once per input line. Due toFS = "", each character on that line will be a separate field. There areNFfields in the record:$1,$2, ..,$(NF-1), and$NF. The three-part if clause simply outputs outermost braces, and everything not within braces (i.e. whend == 0).It is possible to extend this
awkscriptlet to encompass comments, strings, character constants (use\047to refer to a single quote, unless you put the script into a separate file with#!/usr/bin/awk -f), and to process or ignore preprocessor macros.It does get a bit complicated, and you’ll end up with a couple of hundred lines of awk script, but it should be quite reliable and reasonably fast. The reason it is possible is because the tokenization rules in C in this particular case are easy to follow; I personally would use a full-blown C lexer (lexical analyzer or scanner) in all other use cases. And probably for this, too.
If you want to use a full-blown C lexer, there are a number of them available freely on the net, but you’ll have to use a higher level language like C or C++. If you wish to handle all the corner cases, it’ll need to incorporate a C/C++ preprocessor, too, but those rules are easy (even with awk).