I would like to generate various data types in C++ with unique deterministic names. For example:
struct struct_int_double { int mem0; double mem1; };
At present my compiler synthesises names using a counter, which means the names don’t agree when compiling the same data type in distinct translation units.
Here’s what won’t work:
-
Using the ABI mangled_name function. Because it depends already on structs having unique names. Might work in C++11 compliant ABI by pretending struct is anonymous?
-
Templates eg struct2 because templates don’t work with recursive types.
-
A complete mangling. Because it gives names which are way too long (hundreds of characters!)
Apart from a global registry (YUK!) the only thing I can think of is to first create a unique long mangled name, and then use a digest or hash function to shorten it (and hope there are no clashes).
Actual problem: to generate libraries which can be called where the types are anonymous, eg tuples, sum types, function types.
Any other ideas?
EDIT: Addition description of recursive type problem. Consider defining a linked list like this:
template<class T>
typedef pair<list<T>*, T> list;
This is actually what is required. It doesn’t work for two reasons: first, you can’t template a typedef. [NO, you can NOT use a template class with a typedef in it, it doesn’t work] Second, you can’t pass in list* as an argument because it isn’t defined yet. In C without polymorphism you can do it:
struct list_int { struct list_int *next; int value; };
There are several work arounds. For this particular problem you can use a variant of the Barton-Nackman trick, but it doesn’t generalise.
There is a general workaround, first shown me by Gabrielle des Rois, using a template with open recursion, and then a partial specialisation to close it. But this is extremely difficult to generate and would probably be unreadable even if I could figure out how to do it.
There’s another problem doing variants properly too, but that’s not directly related (it’s just worse because of the stupid restriction against declaring unions with constructable types).
Therefore, my compiler simply uses ordinary C types. It has to handle polymorphism anyhow: one of the reasons for writing it was to bypass the problems of C++ type system including templates. This then leads to the naming problem.
Do you actually need the names to agree? Just define the structs separately, with different names, in the different translation units and
reinterpret_cast<>where necessary to keep the C++ compiler happy. Of course that would be horrific in hand-written code, but this is code generated by your compiler, so you can (and I assume do) perform the necessary static type checks before the C++ code is generated.If I’ve missed something and you really do need the type names to agree, then I think you already answered your own question: Unless the compiler can share information between the translation of multiple translation units (through some global registry), I can’t see any way of generating unique, deterministic names from the type’s structural form except the obvious one of name-mangling.
As for the length of names, I’m not sure why it matters? If you’re considering using a hash function to shorten the names then clearly you don’t need them to be human-readable, so why do they need to be short?
Personally I’d probably generate semi-human-readable names, in a similar style to existing name-mangling schemes, and not bother with the hash function. So, instead of generating
struct_int_doubleyou might generatesid(struct, int, double) orsi32f64(struct, 32-bit integer, 64-bit float) or whatever. Names like that have the advantage that they can still be parsed directly (which seems like it would be pretty much essential for debugging).Edit
Some more thoughts:
Edit 2: Sorry, brain failure — sha-1 digests are 160 bits, not 128.
PS. Not sure why this question was down-voted — it seems reasonable to me, although some more context about this compiler you’re working on might help.