I have written 2 implementation of bubble sort algorithm in C and Haskell.
Haskell implementation:
module Main where
main = do
contents <- readFile "./data"
print "Data loaded. Sorting.."
let newcontents = bubblesort contents
writeFile "./data_new_ghc" newcontents
print "Sorting done"
bubblesort list = sort list [] False
rev = reverse -- separated. To see
rev2 = reverse -- who calls the routine
sort (x1:x2:xs) acc _
| x1 > x2 = sort (x1:xs) (x2:acc) True
sort (x1:xs) acc flag = sort xs (x1:acc) flag
sort [] acc True = sort (rev acc) [] False
sort _ acc _ = rev2 acc
I’ve compared these two implementations having run both on file with size of 20 KiB.
C implementation took about a second, Haskell — about 1 min 10 sec.
I have also profiled the Haskell application:
Compile for profiling:
C:\Temp> ghc -prof -auto-all -O –make Main
Profile:
C:\Temp> Main.exe +RTS -p
and got these results. This is a pseudocode of the algorithm:
procedure bubbleSort( A : list of sortable items ) defined as:
do
swapped := false
for each i in 0 to length(A) - 2 inclusive do:
if A[i] > A[i+1] then
swap( A[i], A[i+1] )
swapped := true
end if
end for
while swapped
end procedure
I wonder if it’s possible to make Haskell implementation work faster without changing the algorithm (there are actually a few tricks to make the algorithm work faster, but neither implementations have these optimizations).
It’s probably because bubble sort is an algorithm for arrays, but you’re using a linked list: swapping two items in an array (which is what C uses) is O(1) time and requires no extraneous space, but swapping two items in a linked list (which is what Haskell uses) is O(n) time and O(n) space (and this is heap space, not stack space). However, I’m having a little trouble following your code (are you absolutely sure it’s the same algorithm?), and it’s possible your accumulator deals with the swap’s time complexity. However, even if it does, you’re allocating a new accumulator list; this will definitely still allocate extra space, and I think this may very well still be one of the reasons for Haskell’s worse performance.
Also, why do you have
revandrev2, rather than just usingreversein both places? If it’s because you wanted to profile them separately, then you should instead use the SCC (“Set Cost Center”) pragma, described in chapter 5 of the GHC User’s Guide:sort ({-# SCC "rev" #-} reverse acc) [] Falseand{-# SCC "rev2" #-} reverse acc. Each cost center is profiled separately;-auto-allis just implicitly adding cost centers around each function. If you have these functions for some other reason, you still probably shouldn’t (why do you?), but my explanation is irrelevant 🙂