Welcome! Please see the About page for a little more info on how this works.

0 votes
in Compiler by

It turns out Clojure creates a class file per function. As a result, you get too many .class files.

My question is, why does it generate that many .class files instead of creating a class per namespace and having all functions inside that .class file?

At least for AOT compilation, that'd be great merging those .class files? I'd like to know the technical reason behind it.

3 Answers

+2 votes
selected by
Best answer

Clojure's compiler is not a whole program compiler. The unit of compilation is a top level form. So even when you are say "compile this file" what actually happens is each top level form in that form is compiled on its own, and the compiler can only "see" a single top level form at a time.

Functions are closures, so they are not just code (like a static method) they are code + closed over values (like an instance method, where the closed over values are fields on the instance).

for example

(fn [x] (fn [y] [x y]))

This code has an inner and outer function, and compiles to two classes. Every time you invoke the outer function, you get a new instance of the inner function's class that is passed the value of x when it is constructed, and it stores x in some instance field.

The other issue with using static methods to implement fns is that fns are first class values and static methods are not. You cannot pass a static method as an argument, etc. There are some first class representations of static methods available reflection's Method and also indy's MethodHandles, but reflection is considered to be slow, and MethodHandles weren't available when clojure was first created. In either case both of those representations cannot be made to implement clojure's function interface IFn, so a significant amount of work arould be required to make them work(likely re-working how clojure calls functions to use invoke dynamic).

In the general case, functions need to be able to close over values (instance fields) and need to have some first class representation (instances), so static methods are out.

But the clojure compiler does do some optimizations where it can generate a static method for a function, and turn a function call into a static method call in some cases. But it still generates the instance method version (which just calls the static method) because the compiler can't always tell what function is being invoked staticlly, so functions must all support the same generic calling convention (being an instance that implements IFn).

I've been working on an approach to https://clojure.atlassian.net/browse/CLJ-701 which I haven't published any patches for yet (it still doesn't work) which causes the compiler to more aggressively replace certain function definitions and calls with static methods, and that does completely get rid of generating a separate class for those functions, but it is extremely limited in scope, and only functions that are immediately invoked where they are defined and their definition doesn't escape. This wouldn't cover top level defs, because a def makes the value of the function global, sort of the definition of escaping.

Thank you for the info!
0 votes

I think one issue is that when hot-loading, you want to be able to reload or load one function at a time, but in Java, you cannot add or update a method on an existing class, but you can do so at the class level, so by having one class per function we can load and reload one function at a time, and not just one namespace at a time.

Now, I also remember reading that to some extent, it's how it was made, and it evolved from there. I think maybe there would be other ways to do it now, possibly using invoked dynamic and method handles, though this is beyond my understanding, but if newer JDK offer that option now, so much would need to be rewritten to change it, that it's not realistic.

That's what I know, but I can't confirm this.

0 votes

I am curious as to what concrete side effect of 'too many class files' you are seeing? Larger jars, long require times, etc?

Which aspect of the large number of class files are you seeking to optimize?

How many class files are too many?

I'm trying to generate the WASM code from JVM bytecode. So I need .class files to convert. The thing is, every function creates its class(es), and the result is a huge number of files (1k+, it changes based on the project size), so this becomes time-consuming that's why I tried to understand the reason behind generating classes for each function.