Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8340611
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T05:13:38+00:00 2026-06-09T05:13:38+00:00

I know Haskell a little bit, and I wonder if it’s possible to write

  • 0

I know Haskell a little bit, and I wonder if it’s possible to write something like a matrix-matrix product in Haskell that is all of the following:

  1. Pure-functional: no IO or State monads in its type signature (I don’t care what happens in the function body. That is, I don’t care if the function body uses monads, as long as the whole function is pure). I may want to use this matrix-matrix product in a pure function.
  2. Memory-safe: no malloc or pointers. I know that it’s possible to “write C” in Haskell,
    but you lose memory safety. Actually writing this code in C and interfacing it with Haskell also loses memory safety.
  3. As efficient as, say, Java. For concreteness, let’s assume I’m talking about a simple triple loop, single precision, contiguous column-major layout (float[], not float[][]) and matrices of size 1000×1000, and a single-core CPU. (If you are getting 0.5-2 floating point operations per cycle, you are probably in the ballpark.)

(I don’t want this to sound like a challenge, but note that Java can satisfy all of the above easily.)

I already know that

  1. The triple loop implementation is not the most efficient one. It’s quite cache-oblivious. It’s better to use a well-written BLAS implementation in this particular case. However, one can not always count on a C library being available for what one is trying to do. I wonder if reasonably efficient code can be written in normal Haskell.
  2. Some people wrote whole research papers that demonstrate #3. However, I’m not a computer science researcher. I wonder if it’s possible to keep simple things simple in Haskell.
  3. The Gentle Introduction to Haskell has a matrix product implementation. It wouldn’t satisfy the above requirements though.

Addressing comments:

I have three reasons: first, the “no malloc or pointers” requirement
is as yet ill-defined (I challenge you to write any piece of Haskell
code which uses no pointers);

I saw plenty of Haskell programs not using Ptr. Perhaps it refers to the fact that at the machine instruction level, pointers will be used? That’s not what I meant. I was referring to the abstraction level of the Haskell source code.

second, the attack on CS research is out of place (and furthermore I
can’t imagine anything simpler than using code somebody else has
already written for you); third, there are many matrix packages on
Hackage (and the prep work for asking this question should include
reviewing and rejecting each).

It seems that your #2 and #3 are the same (“use existing libraries”). I’m interested in the matrix product as a simple test of what Haskell can do on its own, and whether it allows you to “keep simple things simple”. I could have easily come up with a numerical problem that doesn’t have any ready libraries, but then I’d have to explain the problem, whereas everyone already knows what a matrix product is.

How can Java possibly satisfy 1.? Any Java method is essentially
:: IORef Arg -> ... -> IORef This -> IO Ret

This goes to the root of my question, actually (+1). While Java does not claim to track purity, Haskell does. In Java, whether the function is pure or not is indicated in the comments. I can claim that the matrix product is pure, even though I do mutation in the function body. The question is whether Haskell’s approach (purity encoded in the type system) is compatible with efficiency, memory-safety and simplicity.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T05:13:41+00:00Added an answer on June 9, 2026 at 5:13 am

    As efficient as, say, Java. For concreteness, let’s assume I’m talking about a simple triple loop, single precision, contiguous column-major layout (float[], not float[][]) and matrices of size 1000×1000, and a single-core CPU. (If you are getting 0.5-2 floating point operations per cycle, you are probably in the ballpark)

    So something like

    public class MatrixProd {
        static float[] matProd(float[] a, int ra, int ca, float[] b, int rb, int cb) {
            if (ca != rb) {
                throw new IllegalArgumentException("Matrices not fitting");
            }
            float[] c = new float[ra*cb];
            for(int i = 0; i < ra; ++i) {
                for(int j = 0; j < cb; ++j) {
                    float sum = 0;
                    for(int k = 0; k < ca; ++k) {
                        sum += a[i*ca+k]*b[k*cb+j];
                    }
                    c[i*cb+j] = sum;
                }
            }
            return c;
        }
    
        static float[] mkMat(int rs, int cs, float x, float d) {
            float[] arr = new float[rs*cs];
            for(int i = 0; i < rs; ++i) {
                for(int j = 0; j < cs; ++j) {
                    arr[i*cs+j] = x;
                    x += d;
                }
            }
            return arr;
        }
    
        public static void main(String[] args) {
            int sz = 100;
            float strt = -32, del = 0.0625f;
            if (args.length > 0) {
                sz = Integer.parseInt(args[0]);
            }
            if (args.length > 1) {
                strt = Float.parseFloat(args[1]);
            }
            if (args.length > 2) {
                del = Float.parseFloat(args[2]);
            }
            float[] a = mkMat(sz,sz,strt,del);
            float[] b = mkMat(sz,sz,strt-16,del);
            System.out.println(a[sz*sz-1]);
            System.out.println(b[sz*sz-1]);
            long t0 = System.currentTimeMillis();
            float[] c = matProd(a,sz,sz,b,sz,sz);
            System.out.println(c[sz*sz-1]);
            long t1 = System.currentTimeMillis();
            double dur = (t1-t0)*1e-3;
            System.out.println(dur);
        }
    }
    

    I suppose? (I hadn’t read the specs properly before coding, so the layout is row-major, but since the access pattern is the same, that doesn’t make a difference as mixing layouts would, so I’ll assume that’s okay.)

    I haven’t spent any time on thinking about a clever algorithm or low-level optimisation tricks (I wouldn’t achieve much in Java with those anyway). I just wrote the simple loop, because

    I don’t want this to sound like a challenge, but note that Java can satisfy all of the above easily

    And that’s what Java gives easily, so I’ll take that.

    (If you are getting 0.5-2 floating point operations per cycle, you are probably in the ballpark)

    Nowhere near, I’m afraid, neither in Java nor in Haskell. Too many cache misses to reach that throughput with the simple triple loop.

    Doing the same in Haskell, again no thinking about being clever, a plain straightforward triple loop:

    {-# LANGUAGE BangPatterns #-}
    module MatProd where
    
    import Data.Array.ST
    import Data.Array.Unboxed
    
    matProd :: UArray Int Float -> Int -> Int -> UArray Int Float -> Int -> Int -> UArray Int Float
    matProd a ra ca b rb cb =
        let (al,ah)     = bounds a
            (bl,bh)     = bounds b
            {-# INLINE getA #-}
            getA i j    = a!(i*ca + j)
            {-# INLINE getB #-}
            getB i j    = b!(i*cb + j)
            {-# INLINE idx #-}
            idx i j     = i*cb + j
        in if al /= 0 || ah+1 /= ra*ca || bl /= 0 || bh+1 /= rb*cb || ca /= rb
             then error $ "Matrices not fitting: " ++ show (ra,ca,al,ah,rb,cb,bl,bh)
             else runSTUArray $ do
                arr <- newArray (0,ra*cb-1) 0
                let outer i j
                        | ra <= i   = return arr
                        | cb <= j   = outer (i+1) 0
                        | otherwise = do
                            !x <- inner i j 0 0
                            writeArray arr (idx i j) x
                            outer i (j+1)
                    inner i j k !y
                        | ca <= k   = return y
                        | otherwise = inner i j (k+1) (y + getA i k * getB k j)
                outer 0 0
    
    mkMat :: Int -> Int -> Float -> Float -> UArray Int Float
    mkMat rs cs x d = runSTUArray $ do
        let !r = rs - 1
            !c = cs - 1
            {-# INLINE idx #-}
            idx i j = cs*i + j
        arr <- newArray (0,rs*cs-1) 0
        let outer i j y
                | r < i     = return arr
                | c < j     = outer (i+1) 0 y
                | otherwise = do
                    writeArray arr (idx i j) y
                    outer i (j+1) (y + d)
        outer 0 0 x
    

    and the calling module

    module Main (main) where
    
    import System.Environment (getArgs)
    import Data.Array.Unboxed
    
    import System.CPUTime
    import Text.Printf
    
    import MatProd
    
    main :: IO ()
    main = do
        args <- getArgs
        let (sz, strt, del) = case args of
                                (a:b:c:_) -> (read a, read b, read c)
                                (a:b:_)   -> (read a, read b, 0.0625)
                                (a:_)     -> (read a, -32, 0.0625)
                                _         -> (100, -32, 0.0625)
            a = mkMat sz sz strt del
            b = mkMat sz sz (strt - 16) del
        print (a!(sz*sz-1))
        print (b!(sz*sz-1))
        t0 <- getCPUTime
        let c = matProd a sz sz b sz sz
        print $ c!(sz*sz-1)
        t1 <- getCPUTime
        printf "%.6f\n" (fromInteger (t1-t0)*1e-12 :: Double)
    

    So we’re doing almost exactly the same things in both languages. Compile the Haskell with -O2, the Java with javac

    $ java MatrixProd 1000 "-13.7" 0.013
    12915.623
    12899.999
    8.3592897E10
    8.193
    $ ./vmmult 1000 "-13.7" 0.013
    12915.623
    12899.999
    8.35929e10
    8.558699
    

    And the resulting times are quite close.

    And if we compile the Java code to native, with gcj -O3 -Wall -Wextra --main=MatrixProd -fno-bounds-check -fno-store-check -o jmatProd MatrixProd.java,

    $ ./jmatProd 1000 "-13.7" 0.013
    12915.623
    12899.999
    8.3592896512E10
    8.215
    

    there’s still no big difference.

    As a special bonus, the same algorithm in C (gcc -O3):

    $ ./cmatProd 1000 "-13.7" 0.013
    12915.623047
    12899.999023
    8.35929e+10
    8.079759
    

    So this reveals no fundamental difference between straightforward Java and straightforward Haskell when it comes to computationally intensive tasks using floating point numbers (when dealing with integer arithmetic on medium to large numbers, the use of GMP by GHC makes Haskell outperform Java’s BigInteger by a huge margin for many tasks, but that is of course a library issue, not a language one), and both are close to C with this algorithm.

    In all fairness, though, that is because the access pattern causes a cache-miss every other nanosecond, so in all three languages this computation is memory-bound.

    If we improve the access pattern by multiplying a row-major matrix with a column-major matrix, all become faster, the gcc-compiled C finishes it 1.18s, java takes 1.23s and the ghc-compiled Haskell takes around 5.8s, which can be reduced to 3 seconds by using the llvm backend.

    Here, the range-check by the array library really hurts. Using the unchecked array access (as one should, after checking for bugs, since the checks are already done in the code controlling the loops), GHC’s native backend finishes in 2.4s, going via the llvm backend lets the computation finish in 1.55s, which is decent, although significantly slower than both C and Java. Using the primitives from GHC.Prim instead of the array library, the llvm backend produces code that runs in 1.16s (again, without bounds-checking on each access, but that only valid indices are produced during the computation can in this case easily be proved before, so here, no memory-safety is sacrificed¹; checking each access brings the time up to 1.96s, still significantly better than the bounds checking of the array library).

    Bottom line: GHC needs (much) faster branching for the bounds-checking, and there’s room for improvement in the optimiser, but in principle, “Haskell’s approach (purity encoded in the type system) is compatible with efficiency, memory-safety and simplicity”, we’re just not yet there. For the time being, one has to decide how much of which point one is willing to sacrifice.


    ¹ Yes, that’s a special case, in general omitting the bounds-check does sacrifice memory-safety, or it is at least harder to prove that it doesn’t.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know that infinite sequences are possible in Haskell - however, I'm not entirely
I know that in C# it isn't, and it is in languages like Haskell
I would like to know all possible means of side effects in functional languages,
Does anyone know if there's a function in Haskell which does something like this:
I'd like to know the Haskell way to catch and handle exceptions. As shown
I don't know haskell syntax, but I know some FP concepts (like algebraic data
I am learning Haskell and would like to know whether the constructs known in
I'm having a little difficulty navigating all the haskell docs.. I want to call
I know that if I were to compute a list of squares in Haskell,
I know that in some languages (Haskell?) the striving is to achieve point-free style,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.