Being somewhat new to Clojure I can’t seem to figure out how to do something that seems like it should be simple. I just can’t see it. I have a seq of vectors. Let’s say each vector has two values representing customer number and invoice number and each of the vectors represents a sale of an item. So it would look something like this:
([ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ])
I want to count the number of unique customers and unique invoices. So the example should produce the vector
[ 2 3 ]
In Java or another imperative language I would loop over each one of the vectors in the seq, add the customer number and invoice number to a set then count the number of values in each set and return it. I can’t see the functional way to do this.
Thanks for the help.
EDIT: I should have specified in my original question that the seq of vectors is in the 10’s of millions and actually has more than just two values. So I want to only go through the seq one time and calculate these unique counts (and some sums as well) on that one run through the seq.
In Clojure you can do it almost the same way – first call
distinctto get unique values and then usecountto count results:Note that here you first get list of first and second elements of vectors (map first/second vectors) and then operate on each separately, thus iterating over collection twice. If performance does matter, you may do same thing with iteration (see
loopform or tail recursion) and sets, just like you would do in Java. To further improve performance you can also usetransients. Though for beginner like you I would recommend first way withdistinct.UPD. Here’s version with loop:
As you can see, no need in atoms or something like that. First, you pass state to each next iteration (recur call). Second, you use transients to use temporary mutable collections (read more on transients for details) and thus avoid creation of new object each time.
UPD2. Here’s version with
reducefor extended question (with price):Here we hold intermediate results in a vector
[custs invs total], unpack, process and pack them back to a vector each time. As you can see, implementing such nontrivial logic withreduceis harder (both to write and read) and requires even more code (inlooped version it is enough to add one more parameter for price to loop args). So I agree with @ammaloy that for simpler casesreduceis better, but more complex things require more low-level constructs, such asloop/recurpair.