The locking macro fails bytecode verification on Graal native-image and ART runtime

Question

The locking macro fails bytecode verification on Graal native-image and ART runtime

asked Jul 23, 2014 in Clojure by jira
closed Oct 13, 2020 by alexmiller

Android ART runs compile time verification on bytecode and fails on any usage of the locking macro. The error looks like that seen in CLJ-1829 (in that case clojure.core.server/stop-server calls the locking macro):

10-16 14:49:26.801 2008-2008/? E/AndroidRuntime: java.lang.VerifyError: Rejecting class clojure.core.server$stop_server because it failed compile-time verification (declaration of 'clojure.core.server$stop_server' appears in /data/app/com.clojure_on_android-1/base.apk)

Cause: From discussion on an Android issue (https://code.google.com/p/android/issues/detail?id=80823), it seems this is due to more strictly enforcing the "structural locking" provisions in the JVM spec (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.11.10).

It appears that the mixture of monitor-enter, monitor-exit, and the try/finally block in locking create paths that ART is flagging as not having balanced monitorenter/monitorexit bytecode. Particularly, monitorenter and monitorexit themselves can throw (on a null locking object). The Java bytecode does some tricky exception table handling to cover these cases which (afaict) is not possible to do without modifying the Clojure compiler.

Approach: One possible approach is to have locking invoke the passed body in the context of a synchronized block in Java. This avoids the issue of the tricky bytecode by handing it over to Java but has unknown performance impacts.

Patch: clj-1472-3.patch

See also: Examination of the bytecode as compared to a java synchronized block shows up a number of differences:
https://gist.github.com/AdamClements/2ae6c4919964b71eb470

Screened by: Alex Miller - I'm marking this screened as I think it is a viable approach that fixes the issue and due to the infrequency of its use, I'm not really that concerned about it being a performance problem. I will flag that I think another way to handle this would be to make locking a special form with compiler support, but I'm not sure whether that's worth doing or not, so I will leave that to Rich to decide.

closed with the note: fixed

41 Answers

Page:

jira · Answer 1 · 2015-12-21T21:33:35+0000

Comment made by: alexmiller

I added a new clj-1472.patch that fixes a few minor details I didn't like about the prior patch. However, it is still essentially the same change so I have retained the original author attribution.

jira · Answer 2 · 2015-12-21T22:14:27+0000

Comment made by: gshayban

alex did you upload the right patch?

jira · Answer 3 · 2015-12-21T22:58:30+0000

Comment made by: alexmiller

Ah, no messed that up. Will fix.

jira · Answer 4 · 2015-12-22T09:54:43+0000

Comment made by: bronsa

Alex, it's probably worth making that a ^:once fn**

jira · Answer 5 · 2015-12-22T19:05:56+0000

Comment made by: gshayban

From Android:
{quote}
Android doesn't run Java bytecode, it runs Dex bytecode. The dexdump output of what your class translates to is interesting.

Neither is the JVMS interesting. Android isn't a Java Virtual Machine. We follow the JLS, but not the JVMS (how could we, when we don't run Java bytecode). As such, all appeals against it are irrelevant. We try to be compatible to the spirit of the JVMS wrt/ Dex bytecode, but if your source isn't Java, there are no guarantees.

Now, the verifiers were (and probably still are) broken, even against our (pretty bad) spec, and regrettably we aren't very consistent. In Marshmallow, for example, a lot of the code we can't correctly verify wrt/ structured locking is rejected as a VerifyError, which is not in the spirit of the JVMS. In the next release, this will be relaxed, however, and postponed to an actual check while running the code.

Regrettably there isn't anything we can do about old releases, you'll have to work around any issues. :-( I'll try to take a look at your class when I find the time.
{quote}

Sounds like making a workaround in Clojure is the least of all evils.

jira · Answer 6 · 2016-01-12T16:30:30+0000

Comment made by: alexmiller

Our current belief is that Android is at fault here, so backlogging this for now.

jira · Answer 7 · 2018-08-13T20:58:36+0000

Comment made by: gshayban

GraalVM/native-image is also complaining about monitorenter/exit balancing. Fixed it by mimicing what javac does: https://github.com/ghadishayban/clojure/commit/8acb995853761bc48b62190fe7005b70da692510

jira · Answer 8 · 2018-08-13T21:57:06+0000

Comment made by: alexmiller

Ghadi, if that's a viable fix, I'm interested in a patch for that.

jira · Answer 9 · 2018-08-14T16:23:17+0000

Comment made by: gshayban

If someone could assist me with Android testing infra, I'd be game.

In javac's emission of synchronized, it installs a catch handler with the exception type "any". The linked commit catches Exception. If desired, we can emit the any handler (I don't know what that means yet exactly -- Throwable + anything else in the future?) by calling GeneratorAdapter.visitTryCatch with null as the target class.

jira · Answer 10 · 2018-08-14T16:56:26+0000

Comment made by: adamclements

Have you tried applying the existing clj-1472-2.patch to see if that fixes GraalVM? I think we'd originally reached a place where people were happy with the attached patch (correct me if I'm wrong Alex) but it was decided that the JVM implementation is king so the problem was with Android for not conforming to that rather than with Clojure. It could be that decision is up for reconsideration if the same problem has reared its head elsewhere (I'd still like to see this patched myself).

This was an age ago now, but I remember trying all different combinations of clojure try/catch/finally/throw I could think of and not being able to get the emitted bytecode to conform to the spec for synchronised without altering clojure's code generation which was far too deep down to be practical - hence the above patch which relies on javac to generate the bytecode around the synchronisation instead of using monitor-enter and monitor-exit. I'd be concerned with your linked implementation that it might still generate wrong bytecode in some situations much like the existing implementation does even though it looks perfectly fine and apparently correct at the Clojure code level.

jira · Answer 11 · 2018-08-14T18:47:51+0000

Comment made by: gshayban

I hadn't tried it but I'm sure clj-1472-2.patch would work from a feasability perspective. It has a perf downside: it can cause "profiling pollution" from the JVM's perspective and confuse lock optimization. (Most locks are uncontended)

I'd rather fix the emission to be more like javac, iff that works for Android.

jira · Answer 12 · 2018-12-12T15:25:00+0000

Comment made by: eraserhd

I've tried the diff Ghadi linked to, but there are issues (including discarding the locked expression value). I fixed that, but after research, I've realized that:
1. This doesn't cover exceptions thrown by monitor-exit itself, which appears to be required. We can't recur in a catch, and we can't catch 'any', so we can't generate this bytecode with just Clojure constructs.
2. The bytecode emitted by Clojure's finally includes a store and a load uncovered by exception handling, so we can't just add exception handling inside the monitor-exit code. I'm pretty sure we can't remove the store and load.

So, unless I'm missing something, the only way to pass a structural locking check is to make locking a special form.

jira · Answer 13 · 2018-12-12T17:10:57+0000

Comment made by: adamclements

The already attached clj-1472-2.patch passes the structural locking tests.

After I did much exhaustive testing and comparing of generated bytecode back in 2014 this was the only viable approach I could find that didn't require instructing a new special form which reimplemented the poorly documented spec exactly to emit exactly the right bytecode, which seemed fragile to me.

I'd be interested to see some profiling numbers to see how big this claimed performance impact is, especially given how rare usage of the locking macro actually is in clojure (I could only find a few examples in the whole ecosystem when testing. Most people use the other concurrency primitives or java interop)

jira · Answer 14 · 2018-12-12T18:27:40+0000

Comment made by: alexmiller

-3 patch rebases on master, no semantic changes, attribution retained

The locking macro fails bytecode verification on Graal native-image and ART runtime

Please log in or register to add a comment.

41 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

The locking macro fails bytecode verification on Graal native-image and ART runtime

Please log in or register to add a comment.

41 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories