Is there a way to programmatically check if a single C source file is potentially harmful?
I know that no check will yield 100% accuracy — but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
Note: the files I will be inspecting are relatively small in size (few 100s of lines at most), implementing numerical analysis functions that all operate in memory. No external libraries (except math.h) shall be used in the code. Also, no I/O should be used (functions will be run with in-memory arrays).
Given the above, are there some programmatic checks I could do to at least try to detect harmful code?
Note: since I don’t expect any I/O, if the code does I/O — it is considered harmful.
If you want to make sure it’s not calling anything not allowed, then compile the piece of code and examine what it’s linking to (say via
nm). Since you’re hung up on doing this by a “programmatic” method, just use python/perl/bash to compile then scan the name list of the object file.There’s not a lot you can do about buffer overwrites for statically defined buffers, but you could link against an electric-fence type memory allocator to prevent dynamically allocated buffer overruns.
You could also compile and link the C-file in question against a driver which would feed it typical data while running under valgrind which could help detect poorly or maliciously written code.
In the end, however, you’re always going to run up against the “does this routine terminate” question, which is famous for being undecidable. A practical way around this would be to compile your program and run it from a driver which would
alarm-out after a set period of reasonable time.EDIT: Example showing use of
nm:Create a C snippet defining function
foowhich callsfopen:Compile with
-c, and then look at the resulting object file:Here you’ll see that there are two symbols in the
foo.oobject file. One is defined,foo, the name of the subroutine we wrote. And one is undefined,fopen, which will be linked to its definition when the object file is linked together with the other C-files and necessary libraries. Using this method, you can see immediately if the compiled object is referencing anything outside of its own definition, and by your rules, can considered to be “bad”.