I happened across the performance benchmark here and I was curious why clojure was getting beaten by java.
So I tossed it into the profiler (after modifying their version to use unchecked math - which didn't help) and nothing shows up. hmm. decompile and find that
// Decompiling class: leibniz$calc_pi_leibniz
import clojure.lang.*;
public final class leibniz$calc_pi_leibniz extends AFunction implements LD
{
public static double invokeStatic(final long rounds) {
final long end = 2L + rounds;
long i = 2L;
double x = 1.0;
double pi = 1.0;
while (i != end) {
final double x2 = -x;
final long n = i + 1L;
final double n2 = x2;
pi += Numbers.divide(x2, 2L * i - 1L);
x = n2;
i = n;
}
return Numbers.unchecked_multiply(4L, pi);
}
@Override
public Object invoke(final Object o) {
return invokeStatic(RT.uncheckedLongCast(o));
}
@Override
public final double invokePrim(final long rounds) {
return invokeStatic(rounds);
}
}
So looks like the double/long boundary is costing us at least a method lookup maybe in Numbers.divide?
So I just coerce everything to double (even our index variable):
(def rounds 100000000)
(defn calc-pi-leibniz2
"Eliminate mixing of long/double to avoid clojure.numbers invocations."
^double
[^long rounds]
(let [end (+ 2.0 rounds)]
(loop [i 2.0 x 1.0 pi 1.0]
(if (= i end)
(* 4.0 pi)
(let [x (- x)]
(recur (inc i) x (+ pi (/ x (dec (* 2 i))))))))))
leibniz=> (c/quick-bench (calc-pi-leibniz rounds))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 575.352216 ms
Execution time std-deviation : 10.070268 ms
Execution time lower quantile : 566.210399 ms ( 2.5%)
Execution time upper quantile : 588.772187 ms (97.5%)
Overhead used : 1.884700 ns
nil
leibniz=> (c/quick-bench (calc-pi-leibniz2 rounds))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 158.509049 ms
Execution time std-deviation : 759.113165 ╡s
Execution time lower quantile : 157.234899 ms ( 2.5%)
Execution time upper quantile : 159.205374 ms (97.5%)
Overhead used : 1.884700 ns
nil
Any ideas why the java implementation not paying the same penalty for division? [both versions are implemented with unchecked-math at :warn-on-boxed].
I also tried a variant with fastmath's primitive math operators and actually got slower. So far nothing has beaten coercing the loop index i
into a double (which I would normally never do).