Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in core.memoize by

Problem: Function A is memoized. Function B, among other things, calculates a valid function call of A without ever invoking A. We want to utilize this effect in B to further cut calculations of A. In case that we expect B to perform faster than A we also want callers of A to wait when a corresponding B is running.

A desired operation how to do this in B would be
(spoof A args calc)

Where calc is a fn of zero args that produces a value equal to (apply A args). This allows the caller to put exactly the amount of work into calc that he expects to perform faster than A.

Spoof returns what calc returns or in case of a cache hit what is cached for args. I. e. if an invocation (A args) happened before the call of (spoof A args calc), calc won't be invoked.

In cases where we already have the result of (A args) and don't want to be blocked by a concurrent invocation of (A args) a desired operation would be

(spoof-unsynchronized A args val)

it returns val immediately and updates the cache of A soon. It doesn't have to block invocations of (A args) since the unsynchronized spoof is a very fast operation that is subject to a race condition with invocations of (A args) anyway. (So a swap! on the internal cache state should suffice).

11 Answers

0 votes
by

Comment made by: seancorfield

This doesn't feel like something that should be part of the core.memoize library but it should be possible to build something like this on top of it (and it isn't right now, correct?).

Would the minimum change necessary to allow this be to expose the underlying cache (so you could invoke deref/swap! against the cache directly)?

0 votes
by

Comment made by: lgs32a

For the unsynchronized case yes, For the synchronized (interesting) case maybe with nasty hacking, I'd have to look at the code.

I disagree that this functionality should not be built in. Its quite general and allows very flexible, user definied memoization implementations.

The design recommended above is stolen from Google Guavas cache.

https://google.github.io/guava/releases/23.0/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-

It allows memoization like this. You just call
(.get c (link: arg1] (fn [) "calc for (link: arg1)")) ;; notice how easy it is to implement a custom memoizer with just that facility!

The necessary logic that runs fn synchronized (once) is in our case implemented in core.memoize. So if core.memoize would expose a similar facility I believe that would be a very generic API that allows much more stuff to be built on top (custom memoizer impls).

It also has .put which handles the unsynchronized case.

0 votes
by

Comment made by: seancorfield

Interesting. My next question was going to be "where did you get the idea for this?".

The underlying machinery in {{core.memoize}} works in a similar way, with a * } function that handles the thunk etc, but it's the {{build-memoizer}} function that encapsulates all of that, along with the {{PluggableMemoizer}} type.

I'll have to give this some thought. The cache itself is attached to the metadata for the function object returned from {{build-memoizer}} so, technically, it's all already exposed if you wanted to program this yourself.

0 votes
by

Comment made by: lgs32a

My understanding is that I would need to invoke private through* with a patched (args ignoring) fn.

Can't really call that exposed :)

0 votes
by

Comment made by: lgs32a

And I'd also need to copypaste the code following line 209 handling a rare timing issue.

0 votes
by

Comment made by: lgs32a

Or did I misunderstand this and you suggested if I wanted to program a patch?

0 votes
by

Comment made by: seancorfield

I meant: the raw cache is right there in the metadata if you wanted to check if an argument list key existed and/or update the cache with a new value via the thunk you talk about above. You can invoke {{core.cache/through}} (or {{core.cache/through-cache}} more likely) directly on that cache, with a thunk (albeit of one argument, which you could just ignore):

(swap! (:clojure.core.memoize/cache (meta the-fn-var)) clojure.core.cache/through-cache [arg1] (fn [_] "calc for [arg1]"))

Any user of TTL cache has to potentially deal with the edge case of eviction during access (unless they don't care about getting {{nil}} back) so, yes, a {{spoof}} function that does the above swap! would potentially have to retry the thunk invocation -- whether you also updated the cache at that point would be up to you (i.e., what behavior do you want for subsequent calls if a cache eviction occurred during your {{spoof}} call?).

The point is that {{spoof}} requires the actual cache to do its job. The memoized function only has access to that cache by virtue of being a closure, so {{spoof}} has to operate on the {{Var}} holding the memoized function.

Does that clarify what I'm suggesting?

0 votes
by

Comment made by: lgs32a

Spoof would also utilize to use internal through* for its locking facilities in d-lay. It might require minor adjustments to support a thunk.

The design recommended above intentionally takes a memoized function so that it can extract ::cache.

Regarding the TTL fix I have to look at it again. Please let me if you are interested in a patch with spoof/spoof-aysnc...

0 votes
by

Comment made by: seancorfield

I hate the names but I'd consider a patch that exposed enough of the machinery to allow users to build their own versions outside the library more easily.

0 votes
by

Comment made by: lgs32a

Did choose the names to fit the current design of the lib. Where Guava gives you powerful primitives and claims generic low level names, this lib is written in the style where it makes strong assumptions about what the user wants to do with it. It makes you have a memoized function with a cache attached right away.

This is why I have chosen names that only describe why you would use them /additionally/. There might be a better word than "spoof" - I'm not a native speaker.

core.cache only exposes the datastructures of caches, but it doesn't expose a proper reference type to cache calculations (instead of values), like Guava does. I suspect that the core.cache authors missed to do this because they thought that Clojures reference types would suffice. core.memoize implements such a reference type for itself as an internal. I'm unsure whether exposing this functionality, which has a good amount of usecases outside of memoization would be at the right place in this lib. Hence the naming and wrapping around memoized functions to clearly suggest that the functionality is supposed in the scope of memoization.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CMEMOIZE-24 (reported by lgs32a)
...