Welcome! Please see the About page for a little more info on how this works.

+11 votes
in Sequences by
retagged by

I found to my surprise that distinct throws an exception when called on a set, e.g.
(distinct #{1 2 3})

A workaround is to call seq on the set first: (distinct (seq #{1 2 3}))

However this seems unneccesary ceremony when every other sequence operator I can think of - map, keep, reduce, first, some etc will accept sets .

Furthermore the docstring for distinct states:
> Returns a lazy sequence of the elements of coll with duplicates removed.

As (coll? #{1 2 3}) => true this is confusing for the user to understand.

There has been some discussion on this topic related to its emergence in math.combinatorics here and here without any real decision.

I propose that distinct should accept any seqable coll.

2 Answers

+2 votes

Sets are already distinct - why would you call distinct on it?

If you had a function that took a collection you wouldn't necessarily know it was a set or a vector, right? Unless you tested it within the function body.
I would just rephrase "I propose that distinct should support sets internally." to just "distinct should accept any seqable coll".
Stan is correct, my enclosing function accepts all collections and it is important that I get distinct values whilst preserving the order. I have updated my question with your proposed wording Alex, thank you both.
I guess it's inadvertent that distinct, exceptionally among sequence functions, does not take a set. distinct's step function uses destructuring to peek at the first item in the collection; destructuring uses nth; and nth does not work with sets.  (nth's docstring likewise calls the argument "coll", but enumerates the concrete types allowed.)

Instead of ruminating deeply about pros and cons of fixing distinct only, the matter could be addressed more deeply in nth iteslf, resolving this quirk far-and-wide.  Already, nth tests several cases, the last being O(n) time for Sequential things.  Why not add another case to cover seq'able things?  Unlike contains?, nth has no qualms about brute force.  And, because nth is the tool of destructuring, it would not be very brutish (destructuring most often reach for only the very few first members) and the benefits would be widespread.
This is not a good idea. nth is for indexed or ordered colls, and sets are neither. A better answer is for distinct to seq it’s input, producing a stable logical list view of a set, similar to all the other sequence functions.
0 votes
This should be fixed since the transducer version works fine and so behaves differently.

(into [] (distinct) #{:a :b})
=> [:b :a]

The regular distinct implementation chooses to use destructuring which needs nth.

And yes, we also have some util functions that take `coll` and are expected to work across all types of coll.