I have a dead simple Common Lisp question: what is the idiomatic way of removing duplicates from a list of strings?
remove-duplicates works as I’d expect for numbers, but not for strings:
* (remove-duplicates '(1 2 2 3))
(1 2 3)
* (remove-duplicates '("one" "two" "two" "three"))
("one" "two" "two" "three")
I’m guessing there’s some sense in which the strings aren’t equal, most likely because although “foo” and “foo” are apparently identical, they’re actually pointers to different structures in memory. I think my expectation here may just be a C hangover.
You have to tell remove-duplicates how it should compare the values. By default, it uses
eql, which is not sufficient for strings. Pass the:testfunction as in:(Edit to address the question from the comments): As an alternative to
equal, you could usestring=in this example. This predicate is (in a way) less generic thanequaland it might (could, probably, possibly, eventually…) thus be faster. A real benefit might be, thatstring=can tell you, if you pass a wrong value:happily yields
nil, whereasgives a
type-errorcondition. Note, though, thatis perfectly well defined (
string=and its friend are defined in terms of “string designators” not strings), so type safety would go only so far here.The standard
eqlpredicate, on the other hand, is almost never the right way to compare strings. If you are familiar with the Java language, think ofeqlas using==whileequal(orstring=, etc.) calling theequals(Object)method. Thougheqldoes some type introspection (as opposed toeq, which does not), for most (non-numeric) lisp types,eqlboils down to something like a pointer comparison, which is not sufficient, if you want to discriminate values based on what they actually contain, and not merely on where in memory they are located.For the more Pythonic inclined,
eq(andeqlfor non-numeric types) is more like theisoperator, whereasequalis more like==which calls__eq__.