I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don’t think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don’t care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a
gcc -O -fPIC -sharedcommand to make some shared objectfoo.soout of your generatedfoo.c). By experience, generating C code takes much less time than compiling it (withgcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.Once you emitted your C code, forked its compilation into a
.soshared object, you candlopenit. Don’t be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don’t really need todlcloseon Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practicallydlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused.sowould be swapped-out.dlopenis done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don’t care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Of course yes: you should fork a command to build the library from the generated C code (e.g. a
gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running amake -j, especially if thegenerated.sois big enough to make it relevant to split thegenerated.cin several C generated files!) and then you dynamically load your library with dlopen (giving a full path like/some/file/path/to/generated.so, and probably theRTLD_NOWflag, to it) and you have to usedlsymto find relevant symbols inside. Don’t think of re-loading (a second time) the samegenerated.so, better to emit a uniquegenerated1.c(thengenerated2.cetc…) C file, then to compile it to a uniquegenerated1.so(the second time togenerated2.so, etc…) then todlopenit (and this can be done many hundred thousands of times). You may want to have, in the emittedgenerated*.cfiles, some constructor functions which would be executed atdlopentime of thegenerated*.soYour base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your
generated*.sothrudlsym-ed function pointers. In practice you would decide for example that eachgenerated*.cdefines a functionvoid dynfoo(int)andint dynbar(int,int)and usedlsymwith"dynfoo"and"dynbar"and call these thru function pointers (returned bydlsym). You should also define conventions of how and when thesedynfooanddynbarwould be called. You’ll better link your base application with-rdynamicso that yourgenerated*.cfiles could call your application functions.You don’t want your
generated*.soto re-define existing names. For instance, you don’t want to redefinemallocin yourgenerated*.cand expect all heap allocation functions to magically use your new variant (that probably won’t work, and if even if it did, it would be dangerous).You probably won’t bother to
dlclosea dynamically loaded shared object, except at application clean-up and exit time (but I don’t bother at all todlclose). If you dodlclosesome dynamically loadedgenerated*.sofile, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.