I have a csv file with the following structure : The first line is

Question

0

Editorial Team

Asked: May 31, 20262026-05-31T05:30:36+00:00 2026-05-31T05:30:36+00:00

I have a csv file with the following structure : The first line is

0

I have a csv file with the following structure :

The first line is a header row
The remaining lines are data lines,
each with the same number of commas, so we can think of the data in
terms of columns

I have written a little script to go through each line of the file and return a sequence of tuples containing the column header and the length of the largest string of data in that column :

let getColumnInfo (fileName:string) =
    let delimiter = ','

    let readLinesIntoColumns (sr:StreamReader) = seq { 
        while not sr.EndOfStream do     
            yield sr.ReadLine().Split(delimiter) |> Seq.map (fun c -> c.Length )
    }

    use sr = new StreamReader(fileName)     
    let headers = sr.ReadLine().Split(delimiter) 
    let columnSizes =
        let initial = Seq.map ( fun h -> 0 ) headers
        let toMaxColLengths (accumulator:seq<int>) (line:seq<int>)  = 
             let chooseBigger a b = if a > b then a else b
             Seq.map2 chooseBigger accumulator line
        readLinesIntoColumns sr |> Seq.fold toMaxColLengths initial
    Seq.zip headers columnSizes;

This works fine on a small file. However when it trys to process a large file (> 75 Mb) it blows fsi with a StackOverflow exception. If I remove the line

Seq.map2 chooseBigger accumulator line

the program completes.

Now, my question is this : why is F# using up the stack? My understanding of sequences in F# is that the entire sequence is not held in memory, only the elements that are being processed. Therefore I expected that the lines that had already been processed would not remain on the stack. Where is my misunderstanding?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T05:30:37+00:00

I think this is a good question. Here’s a simpler repro:

let test n =
    [for i in 1 .. n -> Seq.empty]
    |> List.fold (Seq.map2 max) Seq.empty
    |> Seq.iter ignore

test creats a sequence of empty sequences, calculates the max by rows, and then iterates over the resulting (empty) sequence. You’ll find that with a high value of n this will cause a stack overflow, even though there aren’t any values to iterate over at all!

It’s a bit tricky to explain why, but here’s a stab at it. The problem is that as you fold over the sequences, Seq.map2 is returning a new sequence which defers its work until it’s enumerated. Thus, when you try to iterate through the resulting sequence, you end up calling back into a chain of computations n layers deep.

As Daniel explains, you can escape this by evaluating the resulting sequence eagerly (e.g. by converting it to a list).

EDIT

Here’s an attempt to further explain what’s going wrong. When you call Seq.map2 max s1 s2, neither s1 nor s2 is actually enumerated; you get a new sequence which, when enumerated, will enumerate both of them and compare the yielded values. Thus, if we do something like the following:

let s0 = Seq.empty
let s1 = Seq.map2 max Seq.emtpy s0
let s2 = Seq.map2 max Seq.emtpy s1
let s3 = Seq.map2 max Seq.emtpy s2
let s4 = Seq.map2 max Seq.emtpy s3
let s5 = Seq.map2 max Seq.emtpy s4
...

Then the call to Seq.map2 always returns immediately and uses constant stack space. However, enumerating s5 requires enumerating s4, which requires enumerating s3, etc. This means that enumerating s99999 will build up a huge call stack that looks sort of like:

...
(s99996's enumerator).MoveNext()
(s99997's enumerator).MoveNext()
(s99998's enumerator).MoveNext()
(s99999's enumerator).MoveNext()

and we’ll get a stack overflow.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a csv file with the following structure : The first line is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply