Abstract
locking
macro is 2x slower than synchronized
block of Java, although what the locking
macro do is simply generating monitorenter
and monitorexit
opcodes.
This problem is able to be solved by using synchronized
block of Java instead of generating monitorenter
and monitorexit
directly.
Thread in dev mailing-list is: https://groups.google.com/forum/#!topic/clojure-dev/dJZtRsfikXU
Related to
CLJ-1472 have relationship to this ticket. The purpose of CLJ-1472 is different than this ticket, but changes needed for solving each problem are same. So the contents of patches of both tickets almost are same.
BENCHMARKS
I made 2 sample programs for verifying this problem, on which I make many threads and updating a Map from the threads for making highly race-conditional situation occurred artifically.
In Java sample, I use simply Thread and synchronized block on a HashMap.
In Clojure sample, I made 2 samples.
In 1st sample, I used atom for holding an associative (Map) and used swap!
and assoc
for updating the associative.
In 2nd sample, I used volatile! for holding an java.util.HashMap and used locking
on the volatile reference before updating the HashMap.
Java Sample:
https://github.com/tyano/MultithreadPerformance
Clojure Sample:
https://github.com/tyano/clj-multithread-performance
{quote}
The way to run programs is described on the pages above.
{quote}
Results on my machine (macos 10.13.1, Java 1.8.0_144, 3.1 GHz Intel Core i5, Clojure 1.9.0):
A. Java sample: 6,006ms
B. Clojure - atom with associative: 18,984ms
C. Clojure - locking on a HashMap: 15,883ms
B (Atom and swap! of Clojure) is slower than the java one, but I can understand why it is. Updating an associative creates a new object. Of cause it uses PersistentMap, so it should have better performance than creating a new copy of full Map instance, but it will be slower than updating a simple java.util.Map instance directly (like Java sample do). And swap! might retry the updating action repeatedly. So this result is understandable for me.
But I think the result of C (locking on a HashMap) (15,883ms) is TOO SLOW.
locking
macro is just generating monitorenter
and monitorexit
directly, it nearly is same with what synchronized
block do, so the result must be near from the result of Java sample (6,006ms).
INVESTIGATION
I suspect that the generated bytecodes of locking
will not be same with what synchronized
generates.
Ticket CLJ-1472 also leads my suspicion. This ticket indicates that the bytecodes locking
generates is not same with what JDK generates.
My suspicion: locking
will generate different bytecodes and Java Runtime can not optimize the generated bytecodes well.
Instead of generating opcodes directly, if the locking
macro wraps a macro-body into a Fn and just calls a java method which invoke the supplied Fn in a synchronized block, Runtime might optimize the code well ?
I made a such patch on locking
macro (attached on this ticket) and tried a sample C again with the patched clojure:
THE RESULT WAS: 6,988ms !
SUMMARY
the current implimentation of locking
macro have a performance problem. the bytecode locking
macro generates will not be optimized well on Java Runtime. so the performance is 2x slower than synchronized
block of Java.
A patch attached on this ticket can solve this problem.