I am implementing a filter in Haskell, i.e. a program that I can call from the command-line as follows:
$ cat inputfile.txt | myFilter > outputfile.txt
When running the program on a file of about 80 MB, I get a stack overflow
(Stack space overflow: current size 8388608 bytes.). I am using GHC Version 6.12.3 under cygwin.
I think the problem comes from the sort function that I am using in the program, but after I have been looking for the problem for three days I have no clue how to solve this so I would like if someone could give me a hint.
Here are the essential details about my program.
My filter program reads standard input into a string, splits it into lines and parses each line into a record of some type Event
data Event = ...
which is an instance of Ord
instance Ord Event where
x < y = ...
so that I can sort events using the built-in sort function.
Splitting into lines and parsing the events (one event per line) is performed by a function
p :: String -> [Event]
which internally uses the standard function lines.
I also have a function g that groups events:
g :: [Event] -> [[Event]]
g uses some criteria that are not relevant here; each group can contain at most 4 events.
I sort each group of events (represented as a list) using sort (i.e., all events inside each group get sorted), and finally format the all event groups as a string using a function
f :: [[Event]] -> String
The main function looks as follows:
main = interact (f . (map sort) . g . p)
As said, running this program on a file of about 80 MB gives a stack overflow.
If I replace the sort function with the following function (a naive quick sort implementation):
mySort :: [Event] -> [Event]
mySort [] = []
mySort (e:es) = let h = [j | j <- es, j < e]
t = [j | j <- es, e < j]
in
(mySort h) ++ [e] ++ (mySort t)
main = interact (f . (map mySort) . g . p)
I have no stack overflow!
If in the function mySort I replace the definition of t with the following:
t = [j | j <- es, e <= j]
i.e. I replace < with <=, the stack overflow is there again!
So I have no clue of what is going here.
I cannot see that I have introduced any infinite recursion. My other hypothesis is that lazy evaluation can play a role here (does <= produce a bigger thunk than <?).
I have some experience with Haskell but I am no real expert so I would be glad to get some useful hint because I have been struggling to understand this for the past three days.
The culprit is
which is the wrong way to define an
Ordinstance. The minimal complete definition of anOrdinstance defines one ofcompareor(<=). There are default definitions ofcomparein terms of(<=), and of allOrdmember functions in terms ofcompare. So if you define(<), that’s the onlyOrdmember you can use, all others will loop infinitely when called, since they callcompare, which calls(<=), which callscompare…The
Data.List.sortfunction usescompareto determine the order, so it loops at the first comparison. Your custom quicksort only uses(<), so that works.