If I want to call more than one C function, each one depending on the result of the previous one, is it better to create a wrapper C function that handles the three calls? Will it cost the same as using Haskell FFI without converting types?
Suppose I have the following Haskell code:
foo :: CInt -> IO CInt
foo x = do
a <- cfA x
b <- cfB a
c <- cfC c
return c
Each function cf* is a C call.
Is it better, in terms of performance, to create a single C function like cfABC and make only one foreign call in Haskell?
int cfABC(int x) {
int a, b, c;
a = cfA(x);
b = cfB(a);
c = cfC(b);
return c;
}
Haskell code:
foo :: CInt -> IO CInt
foo x = do
c <- cfABC x
return c
How to measure the performace cost of a C call from Haskell? Not the cost of the C function itself, but the cost of the “context-switching” from Haskell to C and back.
The answer depends mostly on whether the foreign call is a
safeor anunsafecall.An
unsafeC call is basically just a function call, so if there’s no (nontrivial) type conversion, there are three function calls if you make three foreign calls, and between one and four when you write a wrapper in C, depending on how many of the component functions can be inlined when compiling the C, since a foreign call into C cannot be inlined by GHC. Such a function call is generally very cheap (it’s just a copy of the arguments and a jump to the code), so the difference is small either way, the wrapper should be slightly slower when no C function can be inlined into the wrapper, and slightly faster when all can be inlined [and that was indeed the case in my benchmarking, +1.5ns resp. -3.5ns where the three foreign calls took about 12.7ns for everything just returning the argument]. If the functions do something nontrivial, the difference is negligible (and if they’re not doing anything nontrivial, you’d probably better write them in Haskell to let GHC inline the code).A
safeC call involves saving some nontrivial amount of state, locking, possibly spawning a new OS thread, so that takes much longer. Then the small overhead of perhaps calling one function more in C is negligible compared to the cost of the foreign calls [unless passing the arguments requires an unusual amount of copying, many hugestructs or so]. In my do-nothing benchmarkwhere all the C functions just return the argument, the mean for the single wrapped call is a bit above 100ns [105-112], and for the three separate calls around 300ns [290-315].
So a
safec call takes roughly 100ns and usually, it is then faster to wrap them up into a single call. But still, if the called functions do something sufficiently nontrivial, the difference won’t matter.