Excel data can be merged but it's somewhat hard and…

Question

0

Asked: May 12, 20262026-05-12T00:20:18+00:00 2026-05-12T00:20:18+00:00

I’ve recently written a piece of code to read some data from a file,

0

I’ve recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I’ve noticed that using Seq.sortBy (and Array.sortBy) is extremely slower than using IEnumerable.OrderBy.
Below are two snippets of code which should show the behaviour I’m talking about:


(filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) 
                                |> Array.map(double) 
                                |> Array.sort in arr.[0], arr.[1])
).OrderBy(new Func(fun (a,b) -> a))

and


filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) |> Array.map(double) |> Array.sort in arr.[0], arr.[1])
|> Seq.sortBy(fun (a,_) -> a)

On a file containing 100000 lines made of two doubles, on my computer the latter version takes over twice as long as the first one (no improvements are obtained if using Array.sortBy).
Ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T00:20:19+00:00

the f# implementation uses a structural comparison of the resulting key.

let sortBy keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y)) 
        |> to_array) :> seq<_>)

(also sort)

let sort seq =
    mkDelayedSeq (fun () -> 
        (seq 
        |> to_list 
        |> List.sortWith Operators.compare 
        |> to_array) :> seq<_>)

both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)

let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int = 
    GenericComparisonIntrinsic x y
        // lots of other types elided
        when 'T : float = if (# "clt" x y : bool #) 
                          then (-1) 
                          else (# "cgt" x y : int #)

but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.

The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.

In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer<T>.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.

That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.

If you wish to see what effect the sorting/copying change would have try this alternate:

// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f = 
    { new IEnumerable<'b> with 
        member x.GetEnumerator() = f()
      interface System.Collections.IEnumerable with 
        member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }

let mkDelayedSeq (f: unit -> IEnumerable<'T>) = 
    mkSeq (fun () -> f().GetEnumerator())

// the function
let sortByFaster keyf seq =
    let comparer = ComparisonIdentity.Structural
    mkDelayedSeq (fun () -> 
        let buffer = Seq.to_array seq
        Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
        buffer :> seq<_>)

I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions