Well, it turns out that I got this function defined in my program code:
st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a
st_zipOp f xs ys = St.foldr (\x r -> st_map (f x) r) xs ys
It does what it seems to do. It zips (applying the operator several times, yes) two elements of type Stream a, which is a list-like type, using an inner operator of the type a. The definition is pretty straightforward.
Once I had defined the function this way, I tried this other version:
st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a
st_zipOp = St.foldr . (st_map .)
As far as I know, this is exactly the same definition as above. It is just a point-free version of the previous definition.
However, I wanted to check if there was any performance change, and I found that, indeed, the point-free version made the program run slightly worse (both in memory and time).
Why is this happening? If it is necessary, I can write a test program that reproduces this behavior.
I am compiling with -O2 if that makes a difference.
Simple test case
I wrote the following code, trying to reproduce the behavior explained above. I used lists this time, and the change in the performance was less noticeable, but still existent. This is the code:
opEvery :: (a -> a -> a) -> [a] -> [a] -> [a]
opEvery f xs ys = foldr (\x r -> map (f x) r) xs ys
opEvery' :: (a -> a -> a) -> [a] -> [a] -> [a]
opEvery' = foldr . (map .)
main :: IO ()
main = print $ sum $ opEvery (+) [1..n] [1..n]
where
n :: Integer
n = 5000
The profiling results using opEvery (explicit arguments version):
total time = 2.91 secs (2906 ticks @ 1000 us, 1 processor)
total alloc = 1,300,813,124 bytes (excludes profiling overheads)
The profiling results using opEvery' (point free version):
total time = 3.24 secs (3242 ticks @ 1000 us, 1 processor)
total alloc = 1,300,933,160 bytes (excludes profiling overheads)
However, I expected both versions to be equivalent (in all senses).
For the simple test case, both versions yield the same core when compiled with optimisations, but without profiling.
When compiling with profiling enabled (
-prof -fprof-auto), the pointfull version gets inlined, resulting in the main part being(you get something better without profiling).
When compiling the pointfree version with profiling enabled,
opEvery'is not inlined, and you getWhen you add an
{-# INLINABLE opEvery' #-}pragma, it can be inlined even when compiling for profiling, givingwhich is even a bit faster than the pragma-less pointfull version, since it doesn’t need to tick the counters.
It is likely that a similar effect occurred for the
Streamcase.The takeaway: