I’m writing a template for expressions parametrised by an arbitrary number of char labels.
Given an argument list, a factory function returns an expression of different types depending on whether there are two arguments of the same types or whether they are unique.
A concrete example: suppose that A is a “labelable” object with its operator() overloaded to produce an ?Expression<...>. Let a, b, ... be declared as labels LabelName<'a'>, LabelName<'b'>, .... Then A(a,b,c,d) would produce a UniqueExpression<'a','b','c','d'>, whereas A(a,c,b,c) would produce a RepeatedExpression<'a','c','b','c'> instead.
To achieve this, I had to define the ?Expression‘s factory function with auto and decltype. Moreover, the decltype has to cascade to another decltype until the metaprogram finishes recursing through the arguments and the return type is finally decided. As an illustration, I have isolated a fairly minimal code for the factory method.
template <typename... T> struct TypeList { };
template <char C> struct LabelName { };
template <typename... T> class UniqueExpression
{
// Contains implementation details in actual code
};
template <typename... T> class RepeatedExpression
{
// Contains implementation details in actual code
};
class ExpressionFactory {
private:
template <char _C, typename... T, typename... _T>
static UniqueExpression<T...>
_do_build(TypeList<T...>,
TypeList<LabelName<_C>>,
TypeList<>,
TypeList<_T...>)
{
return UniqueExpression<T...> ();
}
template <char _C, typename... T, typename... _T1, typename... _T2, typename... _T3>
static RepeatedExpression<T...>
_do_build(TypeList<T...>,
TypeList<LabelName<_C>, _T1...>,
TypeList<LabelName<_C>, _T2...>,
TypeList<_T3...>)
{
return RepeatedExpression<T...> ();
}
template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2, typename... _T3>
static auto
_do_build(TypeList<T...>,
TypeList<LabelName<_C1>, _T1...>,
TypeList<LabelName<_C2>, _T2...>,
TypeList<_T3...>)
-> decltype(_do_build(TypeList<T...>(),
TypeList<LabelName<_C1>, _T1...>(),
TypeList<_T2...>(),
TypeList<_T3..., LabelName<_C2>>()))
{
return _do_build(TypeList<T...>(),
TypeList<LabelName<_C1>, _T1...>(),
TypeList<_T2...>(),
TypeList<_T3..., LabelName<_C2>>());
}
template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2>
static auto
_do_build(TypeList<T...>,
TypeList<LabelName<_C1>, LabelName<_C2>, _T1...>,
TypeList<>,
TypeList<LabelName<_C2>, _T2...>)
-> decltype(_do_build(TypeList<T...>(),
TypeList<LabelName<_C2>, _T1...>(),
TypeList<_T2...>(),
TypeList<>()))
{
return _do_build(TypeList<T...>(),
TypeList<LabelName<_C2>, _T1...>(),
TypeList<_T2...>(),
TypeList<>());
}
public:
template <char C, typename... T>
static auto
build_expression(LabelName<C>, T...)
-> decltype(_do_build(TypeList<LabelName<C>, T...>(),
TypeList<LabelName<C>, T...>(),
TypeList<T...>(),
TypeList<>()))
{
return _do_build(TypeList<LabelName<C>, T...>(),
TypeList<LabelName<C>, T...>(),
TypeList<T...>(),
TypeList<>());
}
};
The factory could be called in the program like so: (in the actual program there is another class with an overloaded operator() which calls the factory)
int main()
{
LabelName<'a'> a;
LabelName<'b'> b;
...
LabelName<'j'> j;
auto expr = ExpressionFactory::build_expression(a,b,c,d,e,f,g,h,i,j);
// Perhaps do some cool stuff with expr
return 0;
}
The above code works as intended, and is correctly compiled by both GCC and the Intel compiler. Now, I understand that the compiler would take more time to perform recursive template deduction as I crank up the number of labels I use.
On my computer, if build_expression is called with one argument, then GCC 4.7.1 takes around 0.26 second to compile on average. The compile time scales up to around 0.29 second for five arguments, and to 0.62 second for ten arguments. This is all perfectly reasonable.
The story is quite different with the Intel compiler. ICPC 13.0.1 compiles the one-argument code in 0.35 second, and the compile time stays pretty much constant for up to four arguments. At five arguments the compile time goes up to 12 seconds, and at six arguments it shoots up above 9600 seconds (that is, over 2 hours and 40 minutes). Needless to say, I haven’t waited long enough to find out how long it takes to compile the seven-argument version.
Two questions immediately come to mind:
-
Is the Intel compiler particularly known to be slow to compile recursive
decltype? -
Is there any way to rewrite this code to achieve the same effect in a way that is perhaps friendlier to the compiler?
Here is a stab at it. Instead of doing pairwise comparisons of each of the elements, I sort the list of types, then use a brain-dead unique algorithm to see if there are any duplicates.
I implemented merge sort on types, because it was fun. Probably a naive bubble sort would work better on reasonable number of arguments. Note that a bit of work would allow us to do a merge sort on long lists, and specialize for bubble sorts (or even insertion sorts) on short lists. I’m not up for writing a template quicksort.
This gives me a compile time boolean that says if there are duplicates in the list. I can then use enable_if to pick which overload I’m going to use.
Note that your solution involved n^2 layers of template recursion, at each stage of which the return type requires evaluating the type of a 1 step simpler class, and then the type returned also requires the same! If the Intel compiler memoization fails, you are talking exponential amounts of work.
I augmented a few of your classes with some helpers. I made your
LabelNames inherit fromstd::integral_constant, so I have easy compile time access to their value. This makes the sorting code a tad easier. I also exposed anenumfrom the repeated and unique return values so I can do simpleprintfdebugging on the result.Most of this work is writing the merge sort — is there a standard compile time type sort we could use?
In addition I’d like to add a critique of your code: Template programming is programming — your typenames are the equivalent of using “i1” through “i9” for integer variables in a function. Give your typenames meaningful names when doing something non-trivial.
How does this compile on Intel?