Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

clojure.core/some is implemented using first and recur next. It could be implemented using reduce and reduced to gain a significant speedup on large collections.
I benchmarked range, iterate, vector, sorted-set and a lazy-seq using criterium and got the following results:

| | clojure.core/some | reduce some | :speed-up |
| :iterate | 14.502145 ms | 3.994055 ms | 3.63x |
| :range | 16.949429 ms | 14.903065 ms | 1.14x | <-- This result is for clojure.lang.Range
| :vector | 23.706839 ms | 5.765865 ms | 4.11x |
| :lazy-seq | 28.723150 ms | 5.616475 ms | 5.11x |
| :set | 53.063608 ms | 17.419191 ms | 3.05x |

Each collection was of size 1e7, and some was called with (some #(< 1e6 %) coll), taking up 1m elements from the collection.

I ran these test on an uberjar with the following java opts:
java -Xmx3g "-Dclojure.compiler.direct-linking=true" -server

The reduce version of some I used and the benchmarking code can be found in this gist:
https://gist.github.com/petterik/63fad1126842ba4550dcb338328e2975

8 Answers

0 votes
by

Comment made by: petterik

I messed up the formatting in the description and I can't find how to edit it (possibly because I don't have permission?). Removing the dotted lines should be somewhat good enough.

0 votes
by

Comment made by: jafingerhut

I have bumped up the permissions on your JIRA account, since you are on the list of people who have signed a Clojure Contributor Agreement. You should be able to edit the ticket now.

If you have a proposed change to make to Clojure, I'd recommend creating a patch using the instructions found at the "Developing Patches" link on this page: https://dev.clojure.org/display/community/Contributing

0 votes
by

Comment made by: petterik

Sure, Andy. I'm initiating a move tomorrow, so I'll get to it next week.

0 votes
by
_Comment made by: petterik_

Attached CLJ-2308.patch - 20/Jan/18 5:51 PM:

Implemented clojure.core/some and clojure.core/every? with reduce and early termination with reduced.

To get this to work I had to:
* Use reduce1 instead of reduce
* Move deref and reduced above reduce1
* Take care of reduced values in reduce1

I've benchmarked reduced based clojure.core/every? with collection sizes of 1e4 and pred #(< % 1e3):
| |  clojure.core/every? | reduce every? | :speed-up
|  :iterate | 16.897504 µs |  8.273847 µs | 2.04x
|:long-range | 15.197158 µs |  8.658177 µs | 1.76x
|   :vector | 27.534283 µs |  2.519784 µs | 10.93x
| :lazy-seq | 25.457922 µs |  6.393590 µs | 3.98x
|      :set | 56.421657 µs | 17.143458 µs | 3.29x

Since the results were good and some is so similar to every?, I went ahead and did the change for every?.
0 votes
by

Comment made by: alexmiller

Why are deref and deref-future etc moved in the patch? Or, bigger question, is there a smaller patch with same effect?

0 votes
by

Comment made by: petterik

Summary
I think it's worth investigating whether there is a smaller patch with the same effect. I'll see what I can do!

Goal
As the issue has been triaged, try to minimize the patch size.

Problem
some requires reduced to be implemented in reduce to be able to halt the reducing process. There are two implementations of reduce in clojure.core, reduce and reduce1, where reduce implements reduced and reduce1 doesn't. Position a combination of some, reduce1 and/or reduce such that some gets access to a reduce function which implements reduced.

Declaring reduce via (declare reduce) was attempted and it didn't work IIRC (will validate this approach again).

Old approach
As some was above reduce position wise and thus not accessible, focus on getting reduce1 to implement reduced. This required moving around deref, deref-future and a few others, which lead to a pretty big patch.

New approaches
Minimize the patch size by either:
1a. Move some (and every?) below reduce
1b. Move some (and every?) down and reduce up.
1c. Move reduce above some (and every?).
2. Creating some and some1 just like for reduce, where the reduce based some would be below reduce.
3. ?

I'm not a fan of #2, creating more duplications of functions like for reduce1. I'm hoping we can get a small patch with #1x.

0 votes
by
_Comment made by: petterik_

Here's my progress report. I've outlined my thought process.

Please let me know if you're more interested in just metrics and I can make it less chatty.

*TL;DR*

Three patches:
1. Adds new `reduce2` which is redefined as `reduce` when possible. (CLJ-2308-2-reduce2.patch)
2. Implements reduced? and IReduceInit check in reduce1. (CLJ-2308-3-IReduceInit.patch)
3. Redefine `some` with `alter-var-root` once `reduce` has been defined. (CLJ-2308-4-alter-var-root.patch)

All pathes are much smaller than the original patch.

*Context*

As I started to work on a new patch I started remembering why I had the original approach.

1. The distance in clojure/core.clj between `some` and `reduce` is 4,114 LOC and `some` is used 5 times within those LOC.
2. Moving `reduce` above or closer to `some` requires moving some of the `load` expressions, including {{(load "core/protocols")}}.

Given these two hurdles, I revisited the original patch and the question of why `deref` and `deref-future` had to be moved. I was able to create quite a small patch by:
1. Calling the {{.deref}} method on the Reduced object instead of the `deref` function.
2. Calling {{RT/isReduced}} instead of the `reduced?` function.
3. Moving just the `reduced` function above `some`.

When getting ready for a new patch, I first ran two benchmarks and found that `some` based on `reduce1` is slower in one case, and confirmed that `some` based on `reduce` is much faster. This finding is not very surprising as `reduce1` uses seq api for iterating through collections w/ chunked-seq optimization. What really makes `reduce` faster than `first+next` is the coll-reduce protocol and IReduceInit interface.

*Approaches*

1. New `reduce2`:
Given this context, I first played with redefining `reduce1` as `reduce`, but this didn't work as the protocol extension cache code invoked by `reduce` calls `reduce1`. Redefining `reduce1` as `reduce` can make a call to `reduce` loop forever. As this problem seems quite hairy, I went for creating a new reducing function `reduce2` which is redefined as `reduce` using `alter-var-root`.

2. Implementing IReduceInit optimization in `reduce1`:
This approach was inspired by {{clojure.core/set}}, where we special case implementations of IReduceInit. By optimizing for IReduceInit and chunked-seq in reduce1, we're getting some, but not all of the reduce path optimizations.

3. Redefning `some` when `reduce` is available:
Redefining `some` with `alter-var-root` once `reduce` has been defined. This approach is the most isolated and performs well across the board.

*Discussion*

Just adding `reduced` semantics to `reduce1` is not sufficient as most of the performance benefits comes from IReduceInit and CollReduce.

Creating the new `reduce2` allows for opt-in reduce performance enhancements for all functions currently using `reduce1`. Every future change to functions from `reduce1` to `reduce2` would be isolated to only that function. The downside is that there would then exist another reducing function in clojure.core and it's not easy to see why it exists (docs/comment could help). In addition, the new reducing function adds one level of indirection as the function calling `reduce2` has to get the var root. If this is the way forward, adding `reduced` checks in `reduce1` would be nice to have, such that the behavior of `reduce2` is always consistent.

Implementing IReduceInit in `reduce1` has a large contact area, as it does not only affect `some` but a lot of Clojure core. It would for some data structures be a performance win, but for some data structures this change would be a performance loss (e.g. java.lang.String). If this is an interesting way forward, one could either enumerate the slower paths and special case them into `reduce1`.

Redefining `some` after `reduce` has been defined. Smallest foot print of any of the approaches. It will only affect user code as well as usages of `some` after `reduce`, (which currently is `some-fn`). One can consider adding ^:redef to the `some` function, which (if I'm not mistaken) should allow all usages of `some` to benefit from the performance enhancements, even under direct linking.

*Benchmark data*

All approaches was tested using {{criterium.core/bench}}.

Versions explained:

||Patch||:version||description||
| N/A | 1.10.0 | Baseline. Clojure 1.10.0 with no changes. |
| N/A | reduce1-reduced | reduce1 implementing reduced check |
| CLJ-2308-3-IReduceInit.patch | reduce1-reduced+IReduceInit | reduce1 implementing reduced and IReduceInit check |
| CLJ-2308-2-reduce2.patch | reduce2 | reduce2, which is redefined as `reduce` |
| CLJ-2308-4-alter-var-root.patch | some-redefined | Redefines `some` to use `reduce` with `alter-var-root` |
| N/A | some-redefined+meta | Redefines `some` to use `reduce` with `alter-var-root` and adds ^:redef meta to `some` |
| N/A | REPL | Defines reduce based some in REPL running Clojure 1.10.0 |

Tests:
||Test id|| Form ||
|:range | (some #(< 1e3 %) (range (long 1e4))) |
|:iterate | (some #(< 1e3 %) (iterate inc 0)) |
|:string | (some #(= % \x) "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111x") |

Mean times:
||Test id || 1.10.0 || reduce1-reduced || reduce1-reduced+IReduceInit || reduce2 || some-redefined || some-redefined+meta || REPL ||
| :range | 29.439476 µs | 7.281439 µs | 4.958930 µs | 4.935221 µs | 5.030624 µs | 5.052136 µs | 5.636544 µs |
| :iterate | 36.640107 µs | 56.214323 µs | 6.622923 µs | 6.835455 µs | 7.014428 µs | 7.242155 µs | 6.660484 µs |
| :string | 6.236579 µs | 7.853746 µs | 7.645742 µs | 4.186791 µs | 4.170554 µs | 4.180583 µs | 1.898992 µs |
| Sum | 72.316162 | 71.349508 | 19.227595 | 15.957467 | 16.215606 | 16.474874 | 14.19602 |

All results were reproducible and consistent, except the REPL results where I could get as low as 3 µs for :range and :iterate, which signals to me that there's some JIT optimization going on sometimes. I suspect something similar is going on for the :string test in my REPL.
by
I notice that all the performance testing was done with collections of large size (at least 1e4, testing 1e3, and then 1e7, testing 1e6). How well does it perform for small collections (e.g., 1e2 testing 1e1)?
0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-2308 (reported by petterik)
...