Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

Abstract

locking macro is 2x slower than synchronized block of Java, although what the locking macro do is simply generating monitorenter and monitorexit opcodes.

This problem is able to be solved by using synchronized block of Java instead of generating monitorenter and monitorexit directly.

Thread in dev mailing-list is: https://groups.google.com/forum/#!topic/clojure-dev/dJZtRsfikXU

Related to

CLJ-1472 have relationship to this ticket. The purpose of CLJ-1472 is different than this ticket, but changes needed for solving each problem are same. So the contents of patches of both tickets almost are same.

BENCHMARKS

I made 2 sample programs for verifying this problem, on which I make many threads and updating a Map from the threads for making highly race-conditional situation occurred artifically.

In Java sample, I use simply Thread and synchronized block on a HashMap.

In Clojure sample, I made 2 samples.
In 1st sample, I used atom for holding an associative (Map) and used swap! and assoc for updating the associative.
In 2nd sample, I used volatile! for holding an java.util.HashMap and used locking on the volatile reference before updating the HashMap.

Java Sample:
https://github.com/tyano/MultithreadPerformance

Clojure Sample:
https://github.com/tyano/clj-multithread-performance

{quote}
The way to run programs is described on the pages above.
{quote}

Results on my machine (macos 10.13.1, Java 1.8.0_144, 3.1 GHz Intel Core i5, Clojure 1.9.0):

A. Java sample: 6,006ms
B. Clojure - atom with associative: 18,984ms
C. Clojure - locking on a HashMap: 15,883ms

B (Atom and swap! of Clojure) is slower than the java one, but I can understand why it is. Updating an associative creates a new object. Of cause it uses PersistentMap, so it should have better performance than creating a new copy of full Map instance, but it will be slower than updating a simple java.util.Map instance directly (like Java sample do). And swap! might retry the updating action repeatedly. So this result is understandable for me.

But I think the result of C (locking on a HashMap) (15,883ms) is TOO SLOW.

locking macro is just generating monitorenter and monitorexit directly, it nearly is same with what synchronized block do, so the result must be near from the result of Java sample (6,006ms).

INVESTIGATION

I suspect that the generated bytecodes of locking will not be same with what synchronized generates.
Ticket CLJ-1472 also leads my suspicion. This ticket indicates that the bytecodes locking generates is not same with what JDK generates.

My suspicion: locking will generate different bytecodes and Java Runtime can not optimize the generated bytecodes well.
Instead of generating opcodes directly, if the locking macro wraps a macro-body into a Fn and just calls a java method which invoke the supplied Fn in a synchronized block, Runtime might optimize the code well ?

I made a such patch on locking macro (attached on this ticket) and tried a sample C again with the patched clojure:

THE RESULT WAS: 6,988ms !

SUMMARY

the current implimentation of locking macro have a performance problem. the bytecode locking macro generates will not be optimized well on Java Runtime. so the performance is 2x slower than synchronized block of Java.

A patch attached on this ticket can solve this problem.

1 Answer

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-2285 (reported by tyano)
...