locking macro is 2x slower than
synchronized block of Java, although what the
locking macro do is simply generating
This problem is able to be solved by using
synchronized block of Java instead of generating
Thread in dev mailing-list is: https://groups.google.com/forum/#!topic/clojure-dev/dJZtRsfikXU
CLJ-1472 have relationship to this ticket. The purpose of CLJ-1472 is different than this ticket, but changes needed for solving each problem are same. So the contents of patches of both tickets almost are same.
I made 2 sample programs for verifying this problem, on which I make many threads and updating a Map from the threads for making highly race-conditional situation occurred artifically.
In Java sample, I use simply Thread and synchronized block on a HashMap.
In Clojure sample, I made 2 samples.
In 1st sample, I used atom for holding an associative (Map) and used
assoc for updating the associative.
In 2nd sample, I used volatile! for holding an java.util.HashMap and used
locking on the volatile reference before updating the HashMap.
The way to run programs is described on the pages above.
Results on my machine (macos 10.13.1, Java 1.8.0_144, 3.1 GHz Intel Core i5, Clojure 1.9.0):
A. Java sample: 6,006ms
B. Clojure - atom with associative: 18,984ms
C. Clojure - locking on a HashMap: 15,883ms
B (Atom and swap! of Clojure) is slower than the java one, but I can understand why it is. Updating an associative creates a new object. Of cause it uses PersistentMap, so it should have better performance than creating a new copy of full Map instance, but it will be slower than updating a simple java.util.Map instance directly (like Java sample do). And swap! might retry the updating action repeatedly. So this result is understandable for me.
But I think the result of C (locking on a HashMap) (15,883ms) is TOO SLOW.
locking macro is just generating
monitorexit directly, it nearly is same with what
synchronized block do, so the result must be near from the result of Java sample (6,006ms).
I suspect that the generated bytecodes of
locking will not be same with what
Ticket CLJ-1472 also leads my suspicion. This ticket indicates that the bytecodes
locking generates is not same with what JDK generates.
locking will generate different bytecodes and Java Runtime can not optimize the generated bytecodes well.
Instead of generating opcodes directly, if the
locking macro wraps a macro-body into a Fn and just calls a java method which invoke the supplied Fn in a synchronized block, Runtime might optimize the code well ?
I made a such patch on
locking macro (attached on this ticket) and tried a sample C again with the patched clojure:
THE RESULT WAS: 6,988ms !
the current implimentation of
locking macro have a performance problem. the bytecode
locking macro generates will not be optimized well on Java Runtime. so the performance is 2x slower than
synchronized block of Java.
A patch attached on this ticket can solve this problem.