Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

Our main Clojure app at CircleCI takes ~90 seconds to load on my MacBook Pro (3.5gz i7). It takes a similar length of time to load in production in our Kubernetes pods.

This time causes us problems in multiple areas:

  • Developer productively is reduced for local development
  • Our CI process is slow – our tests themselves take ~one minute to run, but the time to run the tests is closer to three minutes since the test suite loads the code first, and we have that 90 second load time.
  • Our time to deploy, and our mean-time-to-recovery are reduced – it takes about 20 minutes to deploy a new version of the app, as we do a rolling restart of Kubernetes pods, and each group of pods takes takes 90 seconds to start.

The start-up time of our main app is slower than other apps – our more modern, leaner Clojure services still take ~30 seconds to boot. The slow load times of Clojure affect most of our teams and services, but the pain is felt acutely by teams that work on the main app. The reason that it slower is that is has more dependencies that the other apps, since it has many subsystems embedded in a single Clojure app.

What strategies exist to improve load time?

In the past folks have suggested doing AOT compilation, but I've found that AOT compilation never works as advertised, and introduces strange bugs. It also requires that code is written in an "AOT-aware" fashion, including all libraries.

2 Answers

+2 votes
selected by
Best answer

For the rolling restart in production, doing AOT compilation as part of your application JAR-building process will speed that up (building the JARs will be substantially slower -- you have to pay the compilation time "penalty" somewhere).

We used to build our app uberjars as source but we also suffered from slow startup time in production, so we switched to AOT compilation during uberjar building and that lowered our production startup times from a minute or so on the worst apps to just a few seconds.

As for CI, in order to run the tests, they have to be loaded and compiled so you're not really going to be able to avoid that time somewhere. Your only real option there would be to move to incremental testing and only run tests for code that has changed or depends on code that has changed. For us, that's one of the future promises of switching to Polylith since its built-in test runner figures that out based on a git tag -- but that benefit only comes once you have "everything" in Polylith.

Alex mentioned the docs about speeding up dev loading and I use that at work by having a local classes folder on my classpath and periodically kicking off a manual compile step for each of the main "apps" in our repo. That means that any code that has not changed since the last compilation will just use the existing .class files and not have to go through the Clojure compilation process at load time -- but any code that has changed will still need to be compiled as it is loaded.

It makes a big difference to initial load times when working in the REPL but over time it "drifts" as more and more code gets changed. However, at work, we tend to have fairly long-lived REPLs and eval all code as we modify it so the slow-first-load issue isn't as annoying and I don't bother with the classes thing all the time. And when I say "long-lived REPLs" I mean weeks and sometimes even months. My primary work REPL has been running for ten days at this point and the only reason it's that "short" is that I had a power outage and had to restart my development machine.

Thanks Sean.

Did you have many issues with code that was not AOT safe? Following Alex’s suggestions I’ve tried to do some AOT on our app, and I’m hitting a bunch of problems.

Some have been obscure: we have a namespace that defines two symbols, Identity->Foo and identity->foo (the only difference is in uppercase letters). These two symbol names munged to two files that collide on a case-insensitive file system (thanks Apple).

We also have a macro that expands to a form that calls `intern` to add new vars to the namespace. This isn’t compatible with AOT code - when the compiled classes are loaded the vars don’t exist.

Enabling AOT at this point in the app’s lifecycle is going to be a risk - it would require an audit of a lot of code.
We did not have any code that caused problems with AOT.

I will try to resist commenting on symbols that only differ in case, using intern, or def forms that evaluate at compile time to environment-specific values... :|
+1 vote

You are compiling all of your source every time you load. AOT compiling lets you do that once ahead of time, so that is the tool for the job. Many people use AOT for this purpose and have no issues with it, so I think you are just reacting to FUD here.

You can use it strategically at dev time too - see https://clojure.org/guides/dev_startup_time

I don’t know what you mean by AOT-aware.

When I say "AOT-aware" I mean that top level forms are evaluated when the code is compiled, not when the code is loaded. This means that, for example, any top-level `def` that pulls a value from the environment does so on the CI machine, not from the production machine.
I think most libs don't do that (because it's bad for the many people that AOT) and it's worth an issue if you find something that does.
Hi Alex, and thanks for your input here.

I’ve spent some time today looking at using AOT with fresh optimism. One problem that I’ve hit is how macros are tracked across compilation units.

If I compile a file that uses a macro defined in a different file, then as far as I can tell, my file is complied to a class file that has no way of tracking the dependency on the macro.

If I subsequently modify the macro in the other file, the Clojure runtime has no way of tracking the change (it looks at the modified time stamp of the source and class files) and will load the outdated class file when I load my file.

This could cause problems in local dev if I pull code changed by another developer, and in CI if I attempt to cache compiled class files between builds.

I’ve not yet been able to calculate the gain in load time that we could get with AOT, since I’ve come across some code constructs that are not compatible with AOT, and I was not able to refactor these to be compatible with your the AOT loading system in the time I have available.
Clojure will load the source file if it is newer than the class file. So when you "pull code", presumably you get a file time newer than what you have in cache, and the Clojure runtime is going to load that.

So it is certainly possible that you will sometimes have a stale cache and will require re-compilation or some amount of cache invalidation. You'll have the same kind of issue if you update your external deps. However, the majority of the time, the majority of the cache should remain stable.

Anecdotally, I've had people see improvements in load time for large apps that were 10s of seconds or even minute+ reduction. What you'll see depends heavily on the number of namespaces and how intense the use of macros is.