My question is, does the map/zipmap step (below) run the risk of consuming too much memory, if the .csv file were large enough?
I have a sequence-of-sequences returned from clojure-csv. The following steps are deliberately separated for clarity. In other words, I’d be combining some of these in production code.
; Process the .csv file
(defn fetch-csv-data
"This function accepts a csv file name, and returns parsed csv data,
or returns nil if file is not present."
[csv-file]
(let [csv-data (ret-csv-data csv-file)]
csv-data))
(def bene-csv-inp (fetch-csv-data "benetrak_roster.csv"))
; Pull out the columns/keys, and
(def bene-csv-cols (map #(cstr/trim %1) (first bene-csv-inp)))
; create the keys.
(def bene-csv-keys (map #(keyword %1) bene-csv-cols))
; Make a sequence of just one of the keys:
(def test-ssns2 (map (fn [x] (:GIC-ID x))
(map #(zipmap gic-csv-keys %1) gic-csv-data)))
Thanks.
the only way this code will leak memory is because the
defs will hold the heads of the lazy sequences. if you replace them with functions which return the sequence then the actual head will only exist in the call stack and be handled properly by lazy evaluation.It’s a rough rule of thumb, though sometimes useful, to replace
defswithdefn‘s when they contain infinite sequences, and make them called everywhere instead of read (unless you really want the caching benefits of a lazy sequence for multiple readers, and the sequence is only going have a reasonable amount of data read).using function calls instead of reading the defs here will almost certainly not have any effect on runtime once the hotspot compiler finishes with it.