Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8688519
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T23:23:16+00:00 2026-06-12T23:23:16+00:00

In short: in Clojure, is there a way to redefine a function from the

  • 0

In short: in Clojure, is there a way to redefine a function from the standard sequence API (which is not defined on any interface like ISeq, IndexedSeq, etc) on a custom sequence type I wrote?


1. Huge data files

I have big files in the following format:

  • A long (8 bytes) containing the number n of entries
  • n entries, each one being composed of 3 longs (ie, 24 bytes)

2. Custom sequence

I want to have a sequence on these entries. Since I cannot usually hold all the data in memory at once, and I want fast sequential access on it, I wrote a class similar to the following:

(deftype DataSeq [id
                  ^long cnt
                  ^long i
                  cached-seq]
  clojure.lang.IndexedSeq

  (index [_]     i)
  (count [_]     (- cnt i))
  (seq   [this]  this)
  (first [_]     (first cached-seq))
  (more  [this]  (if-let [s (next this)] s '()))

  (next [_] (if (not= (inc i) cnt)
              (if (next cached-seq)
                (DataSeq. id cnt (inc i) (next cached-seq))
                (DataSeq. id cnt (inc i)
                          (with-open [f (open-data-file id)]
                             ; open a memory mapped byte array on the file
                             ; seek to the exact position to begin reading
                             ; decide on an optimal amount of data to read
                             ; eagerly read and return that amount of data
                          ))))))

The main idea is to read ahead a bunch of entries in a list and then consume from that list. Whenever the cache is completely consumed, if there are remaining entries, they are read from the file in a new cache list. Simple as that.

To create an instance of such a sequence, I use a very simple function like:

(defn ^DataSeq load-data [id]
  (next (DataSeq. id (count-entries id) -1 [])))
; count-entries is a trivial "open file and read a long" memoized

As you can see, the format of the data allowed me to implement count in very simply and efficiently.

3. drop could be O(1)

In the same spirit, I’d like to reimplement drop. The format of these data files allows me to reimplement drop in O(1) (instead of the standard O(n)), as follows:

  • if dropping less then the remaining cached items, just drop the same amount from the cache and done;

  • if dropping more than cnt, then just return the empty list.

  • otherwise, just figure out the position in the data file, jump right into that position, and read data from there.

My difficulty is that drop is not implemented in the same way as count, first, seq, etc. The latter functions call a similarly named static method in RT which, in turn, calls my implementation above, while the former, drop, does not check if the instance of the sequence it is being called on provides a custom implementation.

Obviously, I could provide a function named anything but drop that does exactly what I want, but that would force other people (including my future self) to remember to use it instead of drop every single time, which sucks.

So, the question is: is it possible to override the default behaviour of drop?

4. A workaround (I dislike)

While writing this question, I’ve just figured out a possible workaround: make the reading even lazier. The custom sequence would just keep an index and postpone the reading operation, that would happen only when first was called. The problem is that I’d need some mutable state: the first call to first would cause some data to be read into a cache, all the subsequent calls would return data from this cache. There would be a similar logic on next: if there’s a cache, just next it; otherwise, don’t bother populating it — it will be done when first is called again.

This would avoid unnecessary disk reads. However, this is still less than optimal — it is still O(n), and it could easily be O(1).

Anyways, I don’t like this workaround, and my question is still open. Any thoughts?

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T23:23:17+00:00Added an answer on June 12, 2026 at 11:23 pm

    For the time being, I implemented the workaround I described above. It works by deferring the reading to the first call to (first), which will store the data on a local, mutable cache.

    Note that this version uses unsynchronized-mutable (to avoid volatile-reads on every call to first, next and more and a volatile-write on the first call to first). In other words: DON’T SHARE AMONG THREADS. To make it thread-safe, use volatile-mutable instead (which causes a small performance penalty). It could still cause multiple reads of the same data by different threads. To avoid that, change back to unsynchronized-mutable and be sure to use (locking this ...) when reading from or writing to the field cache.

    EDIT: after some (non rigorous) tests, it seems that the overhead introduced by (locking this ...) is similar to the one introduced by unnecessary reads from disk (note that I’m reading from a fast SSD, that might have already cached part of the data). Therefore, the best thread-safe solution for now (and for my specific hardware) would be to use a volatile cache.

    (deftype DataSeq [id
                      ^long cnt
                      ^long i
                      ^{:unsynchronized-mutable true} cache]
      clojure.lang.IndexedSeq
    
      (index [_]    i)
      (count [_]    (- cnt i))
      (seq   [this] this)
      (more  [this] (if-let [s (.next this)] s '()))
      (next  [_]    (if (not= (inc i) cnt)
                      (DataSeq. id cnt (inc i) (next cache))))
      (first [_]
        (when-not (seq cache)
          (set! cache
                (with-open [f (open-data-file id)]
                  ; open a memory mapped byte array on the file
                  ; seek to the exact position to begin reading
                  ; decide on an optimal amount of data to read
                  ; eagerly read and return that amount of data
                )))
        (first cache)))
    

    What still bothers me is that I must use mutable state just to stop drop (ie, “get out, you useless piece of data”) from reading from the disk…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Common Lisp has return-from ; is there any sort of return in Clojure for
Short: Is there a way we can push to multiple remotes simultaneously using Github
Short story: stopPropagation() prevents a dropdown menu from closing - which is good. But
I'm new to Clojure and new to Emacs. Is there an Emacs short-cut to
Does a function exist in the Clojure library for filtering a collection, and returning
Short Question: Given Clojure's concurrency model, how do I ensure that all LWJGL OpenGL
Note: Not a duplicate of Why does Clojure recur think it should only have
The long and short of this question is which GUI tool kits create very
Is there a term that describes a function that takes no arguments more concisely
Are there non-macro versions of and and or in Clojure? Update: In this case

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.