Welcome! Please see the About page for a little more info on how this works.

+1 vote
in core.cache by

I’m trying to reconcile clojure.cache behavior between what the docstring/code both suggest, and what I observe actually happening. I am referring to this line:

https://github.com/clojure/core.cache/blob/99fbd669a429bc800ff32b64b21cf82b6b486f06/src/main/clojure/clojure/core/cache/wrapped.clj#L43

value-fn (and wrap-fn) will only be called (at most) once even in the
case of retries, so there is no risk of cache stampede.

It seems quite trivial to get the value-fn to be called more than once when the cache is under contention. I’ve emulated this behavior in the repl, as follows:

(require '[clojure.core.cache.wrapped :as cache])
=> nil
(def c (cache/fifo-cache-factory {}))
=> #'user/c
(pmap (fn [_] (cache/lookup-or-miss c :foo (fn [k] (println "miss") k))) (range 10))
miss
miss
miss
miss
miss
miss
miss
miss
=> (:foo :foo :foo :foo :foo :foo :foo :foo :foo :foo)

In this case, the miss function was invoked 8 out of 10 times when threads were competing. I see this same behavior in the real world, too.

What I don't fully understand is why when the swap! protects the call to c/through-cache. I'm guessing it has to do with how swap resolves conflicts. But if so, doesn't that mean the docstring is wrong?

This is reproducible for me in both fifo and ttl caches. I haven't tried the others.

by
I learned more about what is happening, so I am updating here for posterity.  

I also found a solution to my particular problem.  I will revise this to a general problem statement in case it may be a valid use case to consider supporting.

So the fundamental problem is related to how swap! resolves conflicts.  I demonstrated this with the following code:

(def d (atom 0))
(pmap (fn [_] (swap! d (fn [x] (println "x:" x) (inc x)))) (range 10))
x: x: 0
0x: x:x: 0

x: x: 011

 0
x: 1

x: 2x:
 1
x: 1
x: 2
x: x:3
x: x: 2
4x: x: 24x:

x:  2

x: x: 5 2
x: 5

5
2x:
x:x:x:  6
6
 x: 5
76x:
x:
 7
7
x:x:  8
8
x: 9
=> (1 10 2 9 4 3 6 5 8 7)

The end result was correct (10 values incremented), but there were over 30 applications of the swap function to get there with many duplicates of the internal state (e.g., "0").  I can speculate that internally swap! runs the functions in parallel but replays some when conflicts arise.

If we transpose this to the cache, there are cases where parallel instances of the c/through-cache see has? = false, and thus runs the miss branch, and then is re-run when the update conflicted.

This is fine, though I still might argue that, minimally, the docstring is confusing and should perhaps at least call out that this is only w.r.t. single threaded execution.

I will make a second comment with a general problem statement and potential solutions to the problem.
by
# Problem Statement

I need to implement cache semantics in front of certain IO operations.  These IO operations are not mutating, but they are relatively expensive compared to the overhead of cache lookup.  

An example would be the OpenID JSON Web Key Set (JWKS) protocol: Parallel HTTP requests coming into a cluster of Clojure HTTP servers may carry a JSON Web Token (JWT), which needs to be verified as signed by a trusted issuer.  The JWKS protocol provides a mechanism to retrieve public keys from the trusted issuer for this verification.  The ratio of requests to keys can be billions to 1; thus, it is highly conducive to caching in some form.

I have built this caching to date using clojure.cache.wrapped where the cache essentially holds promises to JWKS responses, keyed by JWT key-id.  The basic idea is that when a new JWT key-id is encountered, the cache misses and starts an HTTP request to the IDP to retrieve the key, caching a promise.  The overhead to kick off the request is lightweight, even if the round-trip response may take a few milliseconds.

Clients of the cache may deref this promise in whatever way makes sense.  Long resolved keys deref immediately.  Keys in the middle of being resolved may have multiple clients waiting on the promise.

The problem I currently have is related to inefficiency caused by the clojure.cache internal use of (swap!) when the cache is highly contended.  In the above scenario, a given key-id typically expires simultaneously.  Thus, API clients tend to refresh JWTs around the same time and present a new yet-to-be-cached key to the system.

The net result is a stampede from multiple threads for the same key, thus putting substantial pressure on a specific entry in the cache.  Multiple value-fns are called to kick off a JWKS operation for the same key ID, even though the response is stable and we only need one.  The system still behaves correctly with the current behavior, but it is less efficient because we issue multiple redundant IO operations to get the same key ID.

One could argue that this IO is "side-effecting" and is thus incompatible with (swap!).  While I agree that mutating side effects would certainly be a problem, these "read-only" side effects are not incompatible with the general notion of caching, so it would be ideal if the caching library could support them by eliminating the stampeding re-entrace to the value-fn for the same key.

I have been able to work around the current behavior by wrapping the lookup-or-miss with a construct such as:

```
(:require [clojure.core.cache.wrapped :as cache]))

(defn lookup-or-miss
  [cache-atom e value-fn]
  (or (cache/lookup cache-atom e)
      (locking cache-atom
        (cache/lookup-or-miss cache-atom e value-fn))))
```  
However, perhaps it makes sense to consider a contribution to the upstream library, assuming others see value in solving the stated problem.

Here are a few thoughts on this front:

1) Introducing locking either in the current wrapped implementation or as an option/alternative for those who need stronger guarantees.
2) Re-implement the clojure.cache.wrapped internals to something that supports at-most-once semantics by getting away from (swap!) in favor of something like (volatile!) + (locking).

1 Answer

+1 vote
by

The 1.2.0-SNAPSHOT should address this. Please try it and confirm whether it does.

...