Improve clojure.core/distinct perf by using transient set

Question

Improve clojure.core/distinct perf by using transient set

asked Dec 23, 2016 in Collections by jira

Current implementation of {{clojure.core/distinct}} uses persistent set. This patch improves performance of lazy arity by ~25%-30% and transducer by ~40%-50% by using transient set instead.

10 elements
(doall (distinct coll)) 5.773439 µs => 4.179092 µs (-27%)
(into [] (distinct) coll) 3.238236 µs => 1.943254 µs (-39%)

100 elements
(doall (distinct coll)) 67.725764 µs => 42.129993 µs (-37%)
(into [] (distinct) coll) 35.702741 µs => 16.495947 µs (-53%)

1000 elements
(doall (distinct coll)) 540.652739 µs => 399.053873 µs (-26%)
(into [] (distinct) coll) 301.423077 µs => 164.025500 µs (-45%)

10000 elements
(doall (distinct coll)) 3.439137 ms => 3.058872 ms (-11%)
(into [] (distinct) coll) 1.437390 ms => 848.277178 µs (-40%)

Benchmarking code: https://gist.github.com/tonsky/97dfe1f9c48eccafc983a49c7042fb21

13 Answers

jira · Answer 1 · 2016-12-23T14:52:40+0000

Comment made by: alexmiller

You can't remove the volatile - you still need that for safe publication in multi threaded transducing contexts.

jira · Answer 2 · 2016-12-23T17:50:22+0000

_Comment made by: tonsky_

[~alexmiller] How do you mean?

- I don’t update {{seen}} link because transient set can be mutated in-place
- Are transducers meant to be used from multiple threads? Because even existing implementation clearly has race condition. I imagine fixing that would be costly (we’ll need a synchronized section), so maybe it should be a specialized transducer that you use only when needed?

jira · Answer 3 · 2016-12-23T18:26:48+0000

Comment made by: alexmiller

Transient sets can NOT be mutated in place - you must use the return value.

Yes, transducers are used from multiple threads in (for example) transducer chans in core.async go blocks.

jira · Answer 4 · 2016-12-23T18:28:39+0000

Comment made by: alexmiller

I should also say transducers are not expected to be used from more than one thread at a time, so there are no race problems. But being used from multiple threads over time requires proper safe publication.

jira · Answer 5 · 2016-12-24T09:07:30+0000

_Comment made by: tonsky_

bq. But being used from multiple threads over time requires proper safe publication.

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Does that also mean that {{partition-by}} and {{partition-all}} should be fixed (they use {{java.util.ArrayList}} which, being array of references, has no safe publication semantics)?

bq. Transient sets can NOT be mutated in place - you must use the return value.

I was thinking that {{clojure/core.clj}} and {{clojure.lang.ATransientSet.java}} are both part of Clojure internals, colocated, so can share a little bit of internal knowledge about each other. It seems safe to do that, because that knowledge does not leak outside, and, if at any point impl of ATransientSet would change, core.clj could be updated accordingly in the same release. I wouldn’t do that in any third-party library, of course.

jira · Answer 6 · 2016-12-24T15:13:54+0000

Comment made by: alexmiller

{quote}Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?{quote}

Transients require only that they are asked by no more than a single thread at a time and so are safe to use in a transducer. However, they should guarantee safe publication. core.async channels already do this as an artifact of their implementation, but other transducing contexts may not.

Transients should NEVER be used as "mutate in place", regardless of concurrency. While they will appear to "work" in some circumstances, this is never correct (eventually an update operation will return a new instance and if you are mutating in place, your data will then be missing). This is discussed and correct examples are shown at http://clojure.org/reference/transients.

{quote}Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?{quote}

That's something Rich and I are discussing but, probably.

jira · Answer 7 · 2016-12-24T18:56:19+0000

_Comment made by: tonsky_

[~alexmiller] Here’s quick test that shows that changes to transient set (which is nothing more but a wrapper around transient map) made in one thread are not always visible from another thread.

https://gist.github.com/tonsky/62a7ec6d539fc013186bee2df0812cf6

That means that if we try to use transients for e.g. distinct it will miss duplicate items

jira · Answer 8 · 2016-12-24T19:02:54+0000

Comment made by: tonsky

Removed transients from transducer arity of distincts because transducers might be accessed from multiple threads

jira · Answer 9 · 2016-12-24T19:12:28+0000

_Comment made by: tonsky_

Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete

jira · Answer 10 · 2016-12-31T18:54:32+0000

commented Sep 26, 2024 by Paula Gearon

I looked at exactly this today, and was pleasantly surprised to see this issue already discussing it.
Since this issue was last discussed in 2019 the `contains?` function is now working on transients. Consequently, I've been testing the transducer with a transient set. However, the benchmarks are basically identical, which surprised me. I tried Nikita's original direct call:
`(.contains ^clojure.lang.ITransientSet @seen input)`
This does perform a little better. It seems that the overhead of `contains?` on a transient may be taking up the saved time of using `conj!` over `conj`.

The other thing to note is that I tried updating the single argument version to use:
`(sequence (distinct) coll)`
This is significantly faster than the `lazy-seq` version.

Is this something you're willing to consider a patch for? Happy to submit if you are.

commented Sep 26, 2024 by alexmiller

jira · Answer 11 · 2017-01-04T17:47:27+0000

_Comment made by: tonsky_

I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:

bq. Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.

That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.

That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need.

Is my understanding correct? Am I missing something?

jira · Answer 12 · 2017-01-04T17:55:56+0000

Comment made by: tonsky

Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?

jira · Answer 13 · 2019-06-26T12:00:00+0000

Reference: https://clojure.atlassian.net/browse/CLJ-2090 (reported by tonsky)

Improve clojure.core/distinct perf by using transient set

Please log in or register to add a comment.

Please log in or register to answer this question.

13 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Improve clojure.core/distinct perf by using transient set

Please log in or register to add a comment.

Please log in or register to answer this question.

13 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories