I've written a few libraries that depend on binary resources (eg. native shared libraries) and other libraries that have very large resources (eg. the llm weights for llama).
In principle, both of these types of dependencies can be satisfied by the deps tools without any changes since the deps cli can fetch resources and add them to classpath.
However, there are few additional challenges which makes using the deps cli for large and/or binary resources awkward.
Large resources are Large
I've avoided using the deps machinery for large resources because there are several affordances missing:
a) no yes/no prompt before downloading dependencies.
It seems reasonable to download maybe 100's of megabytes of dependencies when invoking the cli, but it doesn't feel right for the cli to download gigabytes of data dependencies without asking first
b) no progress indicator when downloading dependencies
Downloading dependencies is usually pretty quick, but it's nice to have a progress indicator if several gigabytes of dependencies are being downloaded.
Native dependencies are often large
Native dependencies can range from a few megabytes all the way to a gigabyte (eg. Chromium embedded framework).
Native dependencies are platform/OS dependent
I think there's some maven magic that helps with this which I haven't had a chance to look into. Anyway, the artifacts required for running depend on the platform that the program is running on. Usually, it's ok to include the artifacts for all possible platforms, but that may be unnecessarily using resources and might not work in particular cases.
Shared libraries must be extracted from jars before loading
Many ffi libraries (like JNA) will do this for you. It's implemented in an adhoc way depending on the ffi library. It may or may not work depending on whether the shared library is standalone or depends on other shared libraries.
Potentially, the deps tools could help extract shared libraries using an acceptable, uniform method.
Precompiled linux shared libraries barely work
It's possible to precompile shared libraries for mac osx and windows with a wide range of compatibility that "just work". With linux, there are lots of caveats. Generally, it seems like you can get something that mostly works if you have a standalone shared library and compile it with zig (eg. llmdb), but compatibility goes way down if there are multiple dependent shared libraries that don't use zig (eg. graphviz).
Native dependencies should be compiled from source?
Some package managers ship precompiled binaries (eg. conda), but it seems like many package managers tend to compile from source (eg. pip, macports, homebrew, etc).
Compiling from source may help with some of the following:
- compiling from source sidesteps many of the challenges of providing native dependencies on linux
- native deps could be git deps!
- Some devs really dislike precompiled binaries and strongly prefer compiling from source.
Obviously, supporting compiling from source has its own challenges.
The above covers many of the problems with large and/or binary resources. Here are some concrete use cases:
- clj-cef: wraps the chromium embedded framework. The framework itself is about 0.5-1.5gb depending on the platform and includes both shared libraries and large resource files. There's also an additional small, shared library required.
- clj-graphviz: wraps the graphviz c libraries. Extra tricky because there are multiple shared libraries and each shared libraries have additional dependencies (eg. libpng).
- llama.clj: wraps the llama.cpp library for running llms locally. It would also be nice to add specific llm weights as dependencies, but they can be very large (eg. a few gigabytes up to hundreds of gigabytes). Further, llama.cpp can be compiled with gpu support, but I haven't figured out how to do in a portable way without a compile from source option.
I know there's a lot of overlap with existing tools, so maybe there's some way for the deps cli to integrate with those existing tools:
- fetching native deps: pip, conda, homebrew, macports, apt, scoop, etc.
- fetching data dependencies: hugging face, https://dvc.org/
Anyway, I've made multiple attempts at finding a good approach for including native and/or large dependencies and all of the options still feel pretty awkward.