Suggested improvements to clojure.core/distinct

Question

Suggested improvements to clojure.core/distinct

asked Sep 28, 2024 in Sequences by Paula Gearon

After looking at some performance improvements around clojure.core/distinct, I discovered that Nikita (@tonsky) had made suggestions about this back at the end of 2016, along with a code submission at CLJ-2090.

At the time, Alex made a comment about waiting until clojure.core/contains? supported transient sets, and the issue was pushed back. I note that this has now been addressed. However, in trying a few different approaches, I actually get better performance not using a transient at all, but instead checking if a conj has modified the object. This is done using a non-atomic deref/vreset, which begs the questions… is accessing a transducer supposed to be considered thread-safe? There are approaches to mitigate this, but there isn't a need if that's not a guarantee.

I discuss some of the approaches and provide some benchmarking in this project on Github. The fastest approach actually uses the ITransientSet interface directly, which I got the impression Alex would like to avoid.

commented Sep 28, 2024 by Noah

commented Sep 28, 2024 by Paula Gearon

commented Sep 28, 2024 by Noah

commented Sep 28, 2024 by Paula Gearon

commented Sep 30, 2024 by Noah

commented Oct 1, 2024 by Paula Gearon

For anyone who may be following this, I expanded the benchmarking, separating out the laziness change from the transducer variations. I also added a new variation for completeness.

To be honest, the best performing functions surprised me. I did not expect the persistent set to perform as well as it did, and I also did not expect the transient variations to perform worse for large sequences. However, these both made sense when I considered the shape of the data (large sequences with a small percentage of unique values).

I was also surprised that the transient set approach that skips the call to `contains?` performed as well as it did. While it avoids some of the code path of an earlier function, I am surprised that it didn't slow down similarly to the `distinct-transient` function.

Suggested improvements to clojure.core/distinct

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Categories

Suggested improvements to clojure.core/distinct

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

Categories