Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

If a (mutable) Java collection that exposes it's backing array is passed to c.c/sort in multiple threads, the collection will be concurrently modified in multiple threads.

`
user=> (def q (java.util.concurrent.ArrayBlockingQueue. 1))

'user/q

user=> (future (loop [] (.add q 1) (.remove q 1) (recur)))

object[clojure.core$future_call$reify__4393 0x4769b07b {:status :pending, :val nil}]

user=> (take 3 (distinct (repeatedly #(sort q))))
((1) () nil)
`

Approach: Convert coll to a seq before converting it to an array, thus preserving the original collection.

Patch: 0001-CLJ-1763-make-sort-thread-safe.patch

Alternate approaches:

  1. Document in sort that, like Java arrays, Java collections backed by arrays are modified in-place.
  2. Change RT.toArray() to defensively copy the array returned from a (non-IPersistentCollection) Java collection. This has a number of potential ramifications as this method is called from several paths.
  3. For non-Clojure collections, could also use Collections.sort() instead of dumping to array and using Arrays.sort().

6 Answers

0 votes
by

Comment made by: alexmiller

The docstring says "If coll is a Java array, it will be modified. To avoid this, sort a copy of the array." which also seems like solid advice in this case.

Creating a sequence view of the input collection would significantly alter the performance characteristics.

0 votes
by

Comment made by: alexmiller

I guess what I'm saying is, we should not make the performance worse for persistent collections in order to make it safer for arbitrary Java collections. But it may still be useful to make it safer without affecting persistent collection performance and/or updating the docstring.

0 votes
by

Comment made by: bronsa

Alex, no additional sequence is being created, the seq call was already there

0 votes
by

Comment made by: alexmiller

Well, that's kind of true. The former use did not force realization of the whole seq, just the first element. That said, from a quick test the extra cost seems small on a set (vector seqs are actually faster due to their structure).

0 votes
by

Comment made by: stu

I think this should be a docstring change, if anything at all.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1763 (reported by bronsa)
...