_Comment made by: petterik_
Here's my progress report. I've outlined my thought process.
Please let me know if you're more interested in just metrics and I can make it less chatty.
*TL;DR*
Three patches:
1. Adds new `reduce2` which is redefined as `reduce` when possible. (CLJ-2308-2-reduce2.patch)
2. Implements reduced? and IReduceInit check in reduce1. (CLJ-2308-3-IReduceInit.patch)
3. Redefine `some` with `alter-var-root` once `reduce` has been defined. (CLJ-2308-4-alter-var-root.patch)
All pathes are much smaller than the original patch.
*Context*
As I started to work on a new patch I started remembering why I had the original approach.
1. The distance in clojure/core.clj between `some` and `reduce` is 4,114 LOC and `some` is used 5 times within those LOC.
2. Moving `reduce` above or closer to `some` requires moving some of the `load` expressions, including {{(load "core/protocols")}}.
Given these two hurdles, I revisited the original patch and the question of why `deref` and `deref-future` had to be moved. I was able to create quite a small patch by:
1. Calling the {{.deref}} method on the Reduced object instead of the `deref` function.
2. Calling {{RT/isReduced}} instead of the `reduced?` function.
3. Moving just the `reduced` function above `some`.
When getting ready for a new patch, I first ran two benchmarks and found that `some` based on `reduce1` is slower in one case, and confirmed that `some` based on `reduce` is much faster. This finding is not very surprising as `reduce1` uses seq api for iterating through collections w/ chunked-seq optimization. What really makes `reduce` faster than `first+next` is the coll-reduce protocol and IReduceInit interface.
*Approaches*
1. New `reduce2`:
Given this context, I first played with redefining `reduce1` as `reduce`, but this didn't work as the protocol extension cache code invoked by `reduce` calls `reduce1`. Redefining `reduce1` as `reduce` can make a call to `reduce` loop forever. As this problem seems quite hairy, I went for creating a new reducing function `reduce2` which is redefined as `reduce` using `alter-var-root`.
2. Implementing IReduceInit optimization in `reduce1`:
This approach was inspired by {{clojure.core/set}}, where we special case implementations of IReduceInit. By optimizing for IReduceInit and chunked-seq in reduce1, we're getting some, but not all of the reduce path optimizations.
3. Redefning `some` when `reduce` is available:
Redefining `some` with `alter-var-root` once `reduce` has been defined. This approach is the most isolated and performs well across the board.
*Discussion*
Just adding `reduced` semantics to `reduce1` is not sufficient as most of the performance benefits comes from IReduceInit and CollReduce.
Creating the new `reduce2` allows for opt-in reduce performance enhancements for all functions currently using `reduce1`. Every future change to functions from `reduce1` to `reduce2` would be isolated to only that function. The downside is that there would then exist another reducing function in clojure.core and it's not easy to see why it exists (docs/comment could help). In addition, the new reducing function adds one level of indirection as the function calling `reduce2` has to get the var root. If this is the way forward, adding `reduced` checks in `reduce1` would be nice to have, such that the behavior of `reduce2` is always consistent.
Implementing IReduceInit in `reduce1` has a large contact area, as it does not only affect `some` but a lot of Clojure core. It would for some data structures be a performance win, but for some data structures this change would be a performance loss (e.g. java.lang.String). If this is an interesting way forward, one could either enumerate the slower paths and special case them into `reduce1`.
Redefining `some` after `reduce` has been defined. Smallest foot print of any of the approaches. It will only affect user code as well as usages of `some` after `reduce`, (which currently is `some-fn`). One can consider adding ^:redef to the `some` function, which (if I'm not mistaken) should allow all usages of `some` to benefit from the performance enhancements, even under direct linking.
*Benchmark data*
All approaches was tested using {{criterium.core/bench}}.
Versions explained:
||Patch||:version||description||
| N/A | 1.10.0 | Baseline. Clojure 1.10.0 with no changes. |
| N/A | reduce1-reduced | reduce1 implementing reduced check |
| CLJ-2308-3-IReduceInit.patch | reduce1-reduced+IReduceInit | reduce1 implementing reduced and IReduceInit check |
| CLJ-2308-2-reduce2.patch | reduce2 | reduce2, which is redefined as `reduce` |
| CLJ-2308-4-alter-var-root.patch | some-redefined | Redefines `some` to use `reduce` with `alter-var-root` |
| N/A | some-redefined+meta | Redefines `some` to use `reduce` with `alter-var-root` and adds ^:redef meta to `some` |
| N/A | REPL | Defines reduce based some in REPL running Clojure 1.10.0 |
Tests:
||Test id|| Form ||
|:range | (some #(< 1e3 %) (range (long 1e4))) |
|:iterate | (some #(< 1e3 %) (iterate inc 0)) |
|:string | (some #(= % \x) "1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111x") |
Mean times:
||Test id || 1.10.0 || reduce1-reduced || reduce1-reduced+IReduceInit || reduce2 || some-redefined || some-redefined+meta || REPL ||
| :range | 29.439476 µs | 7.281439 µs | 4.958930 µs | 4.935221 µs | 5.030624 µs | 5.052136 µs | 5.636544 µs |
| :iterate | 36.640107 µs | 56.214323 µs | 6.622923 µs | 6.835455 µs | 7.014428 µs | 7.242155 µs | 6.660484 µs |
| :string | 6.236579 µs | 7.853746 µs | 7.645742 µs | 4.186791 µs | 4.170554 µs | 4.180583 µs | 1.898992 µs |
| Sum | 72.316162 | 71.349508 | 19.227595 | 15.957467 | 16.215606 | 16.474874 | 14.19602 |
All results were reproducible and consistent, except the REPL results where I could get as low as 3 µs for :range and :iterate, which signals to me that there's some JIT optimization going on sometimes. I suspect something similar is going on for the :string test in my REPL.