Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

Sorting a vector gives back an ArraySeq with O(n) gets instead of O(log N) gets. This means it can be more efficient to take a vector, sort, then turn it back into a vector.

Cause: {{sort}} works by copying the collection to be sorted into an array, calls {{Arrays/sort}} to sort it, and then returns a seq on the sorted array. The seq returned is an ArraySeq and doesn't implement Indexed.

Alternatives:

  1. Make ArraySeq (and primitive specializations thereof) implement Indexed, providing constant time lookup by index.
  2. Specialize sorting for different collection types
  3. ???

6 Answers

0 votes
by

Comment made by: ragge

Update description, attach patch.

0 votes
by

Comment made by: ragge

Added link to current patch.

0 votes
by

Comment made by: alexmiller

Another alternative to consider here is to have sort do something smarter.

0 votes
by

Comment made by: ragge

Having thought a bit more about the approach and implications of this I'm not sure this patch is a good idea at all. It makes a little bit sense for the particular case of sorting a vector, but on the other hand {{sort}} only promises to return a sorted sequence of given coll. Implementing {{Indexed}} for a sequence type just because the underlying data structure supports efficient lookup by index feels wrong. Like you suggest, effort is maybe better spent thinking about making sort smarter, which is a different issue, or just using sorted collections instead.

0 votes
by

Comment made by: hiredman

It seems like the best thing here would be to change sort to return a vector. Usages of sort in the middle of sequence pipelines will continue to work, but a sort followed by conj will break (I cannot recall an instance of this off hand, but I am sure they exist). Sorting seems to imply a fully realized collection, and vectors are the "strongest" realized collections that can be returned here.

Given the conservative nature of core, and the issue with conj ordering above, the next best thing might be to add a sortv similar to the existing mapv.

Another option might be to remove the call to seq, so sort returns the sorted array. This would actually be really useful because you can use Arrays.binarySearch. Calls to conj after a sort would then fail with an exception instead of conj to the "wrong" place.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1794 (reported by alexmiller)
...