Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Compiler by

It appears that the bytecode generated by the compiler is not 100% deterministic even when all inputs (tools, dependencies, compiler options, code) remain stable. This can be observed using the following script which just compiles the same code in a loop until the result of two successive runs differ:

#!/usr/bin/env bash

set -euo pipefail

compile() {
    mkdir -p classes/curr

    clojure -Sdeps '{:path ["src" "classes/curr"]}' \
            -M -e "(binding [*compile-path* \"classes/curr\"] (compile 'foo) nil)"

    if [ -d "classes/prev" ]; then
        diff <(cd "classes/prev" && sha256sum * | sort -k2) \
             <(cd "classes/curr" && sha256sum * | sort -k2)
    fi
}

run() {
    rm -rf classes/prev classes/curr
    compile
    local n=1
    while compile; do
        echo $n
        rm -rf classes/prev
        mv classes/curr classes/prev
        n=$(($n+1))
    done
}

run

Where src/foo.clj contains the following code (adapted from some real-world code in Aleph where I first encountered this issue):

(ns foo)

(defn bar []
  (let [a 1
        b 2
        c (delay 3)
        {:keys [foo bar baz qux bla frob]} {:foo "ha"
                                            :bar 4}]
    #(clojure.lang.ArraySeq/create (into-array [a b @c bar]))))

I ran that script with OpenJDK 11.0.15+10 and Clojure CLI 1.11.1.1149 (so Clojure 1.11.1) on Linux 5.15.59. After a few 10s of iterations, the result is something like this:

2,3c2,3
< 57496515c08ffd087a1f3e3e0d6e420c291b27a15d883c93cdad5de1c2cd8bf6  foo$bar$fn__145.class
< f6d5832ee0ee590056911b70da99b525d93d5b0280feb8e9d34e2f214de5dedd  foo$bar.class
---
> eff4dac36986b909c7dacb63b87f2033dc5be62ee5e0b2a3a2a4207e79a77c41  foo$bar$fn__145.class
> 3a9b9f33e4eeed8b80810a02dbeb0e4d72fac83d496409e5cf7c6ff78fa36ff5  foo$bar.class

Diffing the disassembled class files like this:

$ diff -u <(javap -l -c -s -private classes/prev/foo\$bar\$fn__145.class) <(javap -l -c -s -private classes/curr/foo\$bar\$fn__145.class)

Results in:

@@ -3,23 +3,23 @@
   java.lang.Object c;
     descriptor: Ljava/lang/Object;
 
-  long a;
-    descriptor: J
-
   long b;
     descriptor: J
 
   java.lang.Object bar;
     descriptor: Ljava/lang/Object;
 
+  long a;
+    descriptor: J
+
   public static final clojure.lang.Var const__0;
     descriptor: Lclojure/lang/Var;
 
   public static final clojure.lang.Var const__1;
     descriptor: Lclojure/lang/Var;
 
-  public foo$bar$fn__145(java.lang.Object, long, long, java.lang.Object);
-    descriptor: (Ljava/lang/Object;JJLjava/lang/Object;)V
+  public foo$bar$fn__145(java.lang.Object, long, java.lang.Object, long);
+    descriptor: (Ljava/lang/Object;JLjava/lang/Object;J)V
     Code:
        0: aload_0
        1: invokespecial #16                 // Method clojure/lang/AFunction."<init>":()V
@@ -28,13 +28,13 @@
        6: putfield      #18                 // Field c:Ljava/lang/Object;
        9: aload_0
       10: lload_2
-      11: putfield      #20                 // Field a:J
+      11: putfield      #20                 // Field b:J
       14: aload_0
-      15: lload         4
-      17: putfield      #22                 // Field b:J
+      15: aload         4
+      17: putfield      #22                 // Field bar:Ljava/lang/Object;
       20: aload_0
-      21: aload         6
-      23: putfield      #24                 // Field bar:Ljava/lang/Object;
+      21: lload         5
+      23: putfield      #24                 // Field a:J
       26: return
     LineNumberTable:
       line 4: 0
@@ -46,10 +46,10 @@
        3: invokevirtual #35                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
        6: checkcast     #37                 // class clojure/lang/IFn
        9: aload_0
-      10: getfield      #20                 // Field a:J
+      10: getfield      #24                 // Field a:J
       13: invokestatic  #43                 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
       16: aload_0
-      17: getfield      #22                 // Field b:J
+      17: getfield      #20                 // Field b:J
       20: invokestatic  #43                 // Method clojure/lang/Numbers.num:(J)Ljava/lang/Number;
       23: getstatic     #46                 // Field const__1:Lclojure/lang/Var;
       26: invokevirtual #35                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
@@ -58,7 +58,7 @@
       33: getfield      #18                 // Field c:Ljava/lang/Object;
       36: invokeinterface #49,  2           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
       41: aload_0
-      42: getfield      #24                 // Field bar:Ljava/lang/Object;
+      42: getfield      #22                 // Field bar:Ljava/lang/Object;
       45: invokestatic  #55                 // Method clojure/lang/Tuple.create:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Lclojure/lang/IPersistentVector;
       48: invokeinterface #49,  2           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
       53: checkcast     #57                 // class "[Ljava/lang/Object;"

As you can see, the order of the closed-over locals appears to have changed randomly. The chances of this happening increase when the system is under heavy load (e.g. when running Clojure's full test suite at the same time). This leads me to believe that it's somehow caused by memory allocation. And indeed, while inspecting the relevant code in Compiler.java, I think I might have found the culprit: The CLEAR_SITES map uses LocalBinding instances as keys but that class doesn't implement a deterministic hashCode method, thus falling back to the default Object implementation which AFAIUI relies on the internal memory address. And indeed, when providing it with an implementation like this I am so far unable to reproduce the issue:

public int hashCode(){
    return Util.hashCombine(idx, sym.hashCode());
}

Now I haven't read anywhere that the compiler cares much about reproducible builds. But given that otherwise appears to be solid in that regard and the fix would be a pretty small change to make it even more solid, I thought it might make sense to bring it up.

1 Answer

0 votes
by

This is not a high priority goal for us. Which is not to say we wouldn't consider a change if it was solving an actual problem for someone.

by
Thanks for the quick response! FWIW, here's our motivating case: At the moment, we deploy a rather large application to multiple servers in the form of uberjars with AOT compiled class files. These weighed around 90M when I wrote the main comment (shrunk down to 60M after cutting out some unused heavy transitive deps in the meantime). For developing certain features, it's helpful to be able to deploy changes to a test cluster from a dev machine. Ideally in small increments for a tight feedback loop. However, since many people are working from home these days and upstream bandwidth is often limited there, this kind of transfer can be excruciatingly slow. Thas's why we are looking for ways to speed this process up. One way we investigated was to just deploy AOT'ed class files directly, i.e. leaving the uberjar out of the equation. The deployment mechanism would then be able to only transmit files which the server doesn't yet have based on their contents (think `rsync -c`). Now the easiest way to achieve this was to just compile the whole application as before (i.e. without re-using the previous build results locally). Usually this should lead to very small differences relative to the previous build since e.g. dependencies (which make up the bulk of the payload) rarely change. But due to the mentioned non-determinism, we still ended up with more changes than necessary.

FTR: I realize that viable alternatives to achieve the stated goal exist. There just were additional considerations which made us investigate this approach first :-)
...