Last week user Masse asked a question about recursively listing files in a directory

Question

0

Asked: May 17, 20262026-05-17T19:35:30+00:00 2026-05-17T19:35:30+00:00

Last week user Masse asked a question about recursively listing files in a directory

0

Last week user Masse asked a question about recursively listing files in a directory in Haskell. My first thought was to try using monadic lists from the List package to avoid building the entire list in memory before the printing can start. I implemented this as follows:

module Main where

import Prelude hiding (filter) 
import Control.Applicative ((<$>))
import Control.Monad (join)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.ListT (ListT)
import Data.List.Class (cons, execute, filter, fromList, mapL)
import System (getArgs)
import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))

main = execute . mapL putStrLn . listFiles =<< head <$> getArgs

listFiles :: FilePath -> ListT IO FilePath
listFiles path = liftIO (doesDirectoryExist path) >>= listIfDir
  where
    valid "."  = False
    valid ".." = False
    valid _ = True
    listIfDir False = return path
    listIfDir True
      =  cons path
      $  join
      $  listFiles
     <$> (path </>)
     <$> (filter valid =<< fromList <$> liftIO (getDirectoryContents path))

This works beautifully in that it starts printing immediately and uses very little memory. Unfortunately it’s also dozens of times slower than a comparable FilePath -> IO [FilePath] version.

What am I doing wrong? I’ve never used the List package’s ListT outside of toy examples like this, so I don’t know what kind of performance to expect, but 30 seconds (vs. a fraction of a second) to process a directory with ~40,000 files seems much too slow.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T19:35:30+00:00

Profiling shows that join (together with doesDirectoryExists) accounts for most of the time in your code. Lets see how its definition unfolds:

  join x
=> (definition of join in Control.Monad)
  x >>= id
=> (definition of >>= in Control.Monad.ListT)
  foldrL' mappend mempty (fmap id x)
=> (fmap id = id)
  foldrL' mappend mempty x

If in the root directory of the search there are k subdirectories and their contents are already computed in the lists: d₁, d₂, ... d_k, then after applying join you’ll get (roughly): (...(([] ++ d₁) ++ d₂) ... ++ d_k). Since x ++ y takes time O(length x) the whole thing will take time O(d₁ + (d₁ + d₂) + ... + (d₁ + ... d_k-1)). If we assume that the number of files is n and they are evenly distributed between d₁ ... d_k then the time to compute join would be O(n*k) and that is only for the first level of listFiles.

This, I think, is the main performance problem with your solution.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Last week user Masse asked a question about recursively listing files in a directory

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply