Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6113035
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T14:48:46+00:00 2026-05-23T14:48:46+00:00

I just finished reading Programming Concurrency on the JVM by Venkat Subramaniam and in

  • 0

I just finished reading “Programming Concurrency on the JVM” by Venkat Subramaniam and in that book, the author uses as one of his examples, counting the file sizes in a directory tree. He shows implementations using no concurrency, using queues, using a latch, and using scala actors. On my system, all the concurrent implementations (queues, latch and scala actors) are able to run in under 9 seconds when iterating through my /usr directory (OSX 10.6.8, Core Duo 2 Ghz, Intel G1 ssd 160GB).

I’m learning Clojure and decided I’d port the Scala actor version to Clojure using agents. Unfortunately, I was averaging 11-12 seconds which is significantly slower than the others. After spending DAYS pulling my hair out, I discovered that the following bit of code was the culprit (processFile is a function I send to the file-processing agent(s):

(defn processFile
  [fileProcessor collectorAgent ^String fileName]
  (let [^File file-obj (File. ^String fileName)
        fileTotals (transient {:files 0, :bytes 0})]
    (cond
      (.isDirectory file-obj)
        (do
          (doseq [^File dir (.listFiles file-obj) :when (.isDirectory dir)]
            (send collectorAgent addFileToProcess (.getPath dir)))
          (send collectorAgent tallyResult *agent*)
          (reduce (fn [currentTotal newItem] (assoc! currentTotal :files (inc (:files currentTotal))
                                                                  :bytes (+ (:bytes currentTotal) newItem)))
                  fileTotals
                  (map #(.length ^File %) (filter #(.isFile ^File %) (.listFiles file-obj))))
          (persistent! fileTotals))

      (.isFile file-obj) (do (send collectorAgent tallyResult *agent*) {:files 1, :bytes (.length file-obj)}))))

You’ll notice I tried using type-hints and a transient to improve performance, all to no avail. I replaced the above code with the following:

(defn processChildren
  [children]
  (loop [entries children files 0 bytes 0 dirs '()]
    (let [^File child (first entries)]
      (cond
        (not (seq entries)) {:files files, :bytes bytes, :dirs dirs}
        (.isFile child) (recur (rest entries) (inc files) (+ bytes (.length child)) dirs)
        (.isDirectory child) (recur (rest entries) files bytes (conj dirs child))
        :else (recur (rest entries) files bytes dirs)))))

(defn processFile
  [fileProcessor collectorAgent ^String fileName]
  (let [{files :files, bytes :bytes, dirs :dirs} (processChildren (.listFiles (File. fileName)))]
    (doseq [^File dir dirs]
      (send collectorAgent addFileToProcess (.getPath dir)))
    (send collectorAgent tallyResult *agent*)
    {:files files, :bytes bytes}))

This version performed on par if not faster than the Scala version and is almost identical to the algorithm used in the Scala version. I simply assumed that the functional approach to the algorithm would work just as well.

So…this long winded question boils down to the following: Why is the second version faster?

My hypothesis is that though the first version using map/filter/reduce on the contents of the directory is more “functional” than the second version’s rather imperative processing of the directory, it is much less efficient because the directory’s contents are being iterated through multiple times. Since filesystem I/O is slow, the entire program suffers.

Assuming I am right, is it not then safe to say that any recursive filesystem algorithm should prefer an imperative approach for performance?

I’m a total beginner at Clojure so feel free to rip my code to shreds if I’m doing something stupid or non-idiomatic.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T14:48:46+00:00Added an answer on May 23, 2026 at 2:48 pm

    I’ve edited the first version to make it more readable. I have a few comments, but no conclusively helpful statements:

    1. You added transients and typehints with no real evidence as to what was slowing things down. It’s entirely possible to slow things down dramatically with careless application of these operations, so it’s a good idea to profile to find out what’s actually slowing stuff down. Your choices seem reasonable, but I’ve removed the typehints that were obviously having no effect (eg, the compiler needs no hint to know that (File. …) yields a File object).

    2. Clojure (indeed, any lisp) strongly prefers some-agent to someAgent. The prefix syntax means there’s no worry that - can be parsed as a subtraction by a clueless compiler, so we can afford more well-spaced names.

    3. You include calls to a bunch of functions that you don’t define here at all, like tallyResult and addFileToProcess. Presumably they perform fine since you’re using them in the performant version, but by not including them you’ve made it difficult for anyone else to poke around at it and see what speeds things up.

    4. Consider send-off instead of send for I/O-bound operations: send uses a bounded threadpool to avoid swamping your processor. Here this probably doesn’t matter since you’re only using one agent and it serializes, but in future you’ll run into cases where it matters.

    Anyway, as promised, a more-legible rewrite of your first version:

    (defn process-file
      [_ collector-agent ^String file-name]
      (let [file-obj (File. file-name)
            file-totals (transient {:files 0, :bytes 0})]
        (cond (.isDirectory file-obj)
              (do
                (doseq [^File dir (.listFiles file-obj)
                        :when (.isDirectory dir)]
                  (send collector-agent addFileToProcess
                        (.getPath dir)))
                (send collector-agent tallyResult *agent*)
                (reduce (fn [current-total new-item]
                          (assoc! current-total
                                  :files (inc (:files current-total))
                                  :bytes (+ (:bytes current-total) new-item)))
                        file-totals
                        (map #(.length ^File %)
                             (filter #(.isFile ^File %)
                                     (.listFiles file-obj)))) -
                (persistent! file-totals))
    
              (.isFile file-obj)
              (do (send collector-agent tallyResult *agent*)
                  {:files 1, :bytes (.length file-obj)}))))
    

    Edit: You’re using transients in an incorrect way, by throwing away the result of your reduce. (assoc! m k v) is allowed to modify and return the m object, but may return a different object if that’s more convenient or efficient. So you need something more like (persistent! (reduce ...))

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm very new to Perl programming. I've just finished reading the Llama book. Up
I just finished reading a book on scala. What strikes me is that every
Just finished reading this blog post: http://www.skorks.com/2010/03/an-interview-question-that-prints-out-its-own-source-code-in-ruby/ In it, the author argues the case
Hi I just finished reading a beginner book on PHP and I wana start
I recently forced myself to study C++ and I just finished reading the book
I am just about finished reading K&R, and that is all the C that
I'm trying out Android programming for the first time since I just finished reading
Just finished reading Jon Skeet's article about events and delegates and got a question.
I just finished reading the HTML5 Developer's Cookbook and have a question. I've read
I just finished reading about scoping in the R intro , and am very

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.