Welcome! Please see the About page for a little more info on how this works.

+1 vote
in Clojure by

The bug CLJ-2094 demonstrates an edge case with protocols stemming from functions like partial eagerly evaluating their arguments and thus retaining outdated references. I ran into this recently and found that the ticket had been declined because of "normal Clojure evaluation rules".

I think there's an easy fix for protocols that adds very little overhead and allows protocol-related functions to work in both eager and lazy evaluation contexts. Would the team be interested in a patch for this?

by
Perhaps you can provide a bit more information about the "easy fix" and the impact it might have on existing code so that folks have a bit more insight into what the patch might actually involve?
by
agree with Sean, can you share the idea?
by
edited by
lol in the past y’all generally prefer I check before I start slinging patches, so I thought I’d post a gut check before shoving code in your face.

I’ve signed the CLA:

https://github.com/NoahTheDuke/clojure/pull/4

In short: deref the var on the protocol in the couple of protocol functions that look at `:imps` so any updates are seen.
by
In my benchmarks with criterium, it's roughly +5ns per invocation.
by
If I move the calls to the inner-most points of use, the performance cost is lessened as well.

1 Answer

0 votes
by

This question is more of a proposed solution than a problem and that makes it difficult for me to file it somewhere as a ticket. I'm not sure if CLJ-2094 is the actual problem you care about or if it's something else? If something else, please share.

Trying to put words to the suggested solution here... When finding a protocol method match, look up the current protocol value via the var (which may have been updated with new extensions), rather than using the protocol value that was passed (as it may no longer represent the current state).

But the protocol already has mechanisms to reset its state when extensions are added to the protocol, so I think that problem has already been largely solved. The only way the protocol state would not match the current state is that the protocol state you have is no longer connected. And in this case, I think that's because it is the protocol function var holds a function instance, which has a cache, which has the protocol, but the function instance itself is old (the var holds a new function instance).

So you are using a dead function, but doing a lookup during method invocation to recover the live function's equivalent state. Adding a var lookup in the invocation call path is I think a non-starter as the whole protocol design is trying to avoid that (that's why we have a cache which can be dumped on modification). The alternate solution here is to push var lookup of the function to the caller. And that's essentially what you get with an anonymous function implementation of the spec in CLJ-2094 (vs partial).

Again, I'm not sure what problem we're actually trying to solve here. In CLJ-2094, the problem was that the function was captured by partial. The solution is - don't capture the value. We could also consider some alternative to partial that retained the var lookup - you could imagine a partial-var that took a var rather than a function evaluated from a symbol. The workaround is to use wrap in a function to delay the evaluation. Is this enough of a problem to warrant something new? Don't know.

by
Thanks for the detailed description of the situation, Alex. I appreciate the clarity of your word choices.

> Again, I'm not sure what problem we're actually trying to solve here. In CLJ-2094, the problem was that the function was captured by partial. The solution is - don't capture the value.

The problem I'm trying to solve is to make using dead protocol functions work the same as live protocol functions. It's not obvious that protocols, which can be changed by side-effecting functions (`extend`), aren't actually reliant on global state. This can lead to subtle errors when defining protocols in one namespace and extending them in other namespaces if there the protocol is evaluated in between requiring the later namespaces. (Which is to say that at my job we have run into this issue.)

Compare to multimethods and spec, which do have global state and aren't reliant on the order that they're evaluated:

    (defmulti example (fn [a b] [(:type a) (:type b)]))

    (def partial-multi-2094 (partial example {:type :foo}))

    (defmethod example [:foo :bar] [a b] {:a a :b b})

    (partial-multi-2094 {:type :bar :extra :keys})
    ;=> {:a {:type :foo} :b {:type :bar :extra :keys}}

    (require '[clojure.spec.alpha :as s])

    (def partial-valid-2094 (partial s/valid? ::clj-2094))

    (s/def ::clj-2094 (fn [x] true))

    (partial-valid-2094 :a)
    ;=> true

> The solution is - don't capture the value.

Obviously, this is the current solution and one that works in many other instances (such as functions themselves). I think this is a place where the inconsistency causes undue pain and could be positively changed.

You said that derefing the var is a no-go because of the effort that has been put into the method cache. Is your fear that my proposed changes would be used in the generated methods or similar? My proposed change only affects the protocol helper functions (`extends?`, `extenders`, `find-protocol-method`, and `satisfies?`) and doesn't interact with the base protocol method builders, so the existing method cache would continue to work as they do currently.

Thanks again for taking the time to write your detailed comment. I really value the effort you put in to these discussions.
by
> The problem I'm trying to solve is to make using dead protocol functions work the same as live protocol functions.

But why are you doing this? I don't think this is the problem. (I'm not saying there isn't a problem, but I don't think we've found the real thing yet, keep going.)

> It's not obvious that protocols, which can be changed by side-effecting functions (`extend`), aren't actually reliant on global state.

"aren't"? I think "are", because of the side effects

> This can lead to subtle errors when defining protocols in one namespace and extending them in other namespaces if there the protocol is evaluated in between requiring the later namespaces.

I think THIS is actually starting to get at the problem, which is really a repl process thing. Let's try to say it explicitly... sometimes helpful to break into objective and obstacle.

Objective: re-evaluate protocol definition at the repl (why? namespace reloading? redefinition?) and existing objects that extend the protocol should continue to "work" (can we be more specific)

Obstacle: I think this needs more digging, we should try to be specific (and I think when it is, the alternatives and solutions will be much clearer). what happens? why? I think it is something like the objects extending the protocol no longer appear to extend the protocol and protocol methods can't be invoked anymore.
by
>  Objective: re-evaluate protocol definition at the repl (why? namespace reloading? redefinition?) and existing objects that extend the protocol should continue to "work" (can we be more specific)

Not quite the issue at hand (tho it's also a valid issue with a similar source). This is not about re-evaluating the protocol definition, but evaluating the protocol var or protocol function var in functions like `partial` or in let bindings. Because it's deref'd to the underlying object on evaluation, any further calls to `extend-protocol` don't propagate to that earlier reference.

Given:

```
(defprotocol CLJ-2094
  (execute [this bar]))

(defrecord Foo [])

(def partial-extends (partial extends? CLJ-2094))
(def partial-execute (partial execute (->Foo)))
```
If I ever call `extend-protocol` to implement `CLJ-2094` on `Foo`, `partial-extends` returns false and `partial-execute` will throw "not implemented" exceptions:

```
(extend-type Foo
  CLJ-2094
  (execute [this bar] true))

user=> (partial-extends Foo)
false
user=> (partial-execute :bar)
Execution error (IllegalArgumentException) at user/eval145$fn$G (REPL:1).
No implementation of method: :execute of protocol: #'user/CLJ-2094 found for class: user.Foo
```

to recap

Objective: extending protocols should propagate to (non-var) references of the protocols (and to all of the protocol helper functions as well).

Objections:
* Deref'ing the var stored on `:var` is going to be slower than the current code path.
* This problem is actually more generally about references vs vars (cf. `(def fns {:key1 some-func :key2 other-func})` vs `(def fns {:key1 #'some-func :key2 #'other-func})`).
* `extend-protocol`'s global state isn't actually global (unlike spec's global registry).
* `partial` is actually just a bad function and should be avoided lol.

How does that look?
by
I don't agree that is a desirable objective (but agree has nothing to do with protocols). :)
by
If you don't mind, why don't you think it's a desirable objective?
by
The mechanism for global state in Clojure is vars (which are used for multimethods, protocols, and the spec registry). There is great value in allowing vars to be dereferenced to instance values that are independent from global state (ie, this is a feature not a bug). You have the opportunity to do both right now depending on whether you use `#foo` or `foo`. I don't think we want `foo` to resolve to an instance that maintains a back reference to the var that it came from - you have then taken your simple value and complected it with state.

There are competing interests at the repl during development where you want existing instances to "see" new definitions and I think we will revisit some use cases for that purpose. Perhaps we will end up back here at the end of that process but it seems to run against the general Clojure grain.
by
I agree with what you say here, but there are a couple inconsistencies that I think deserve a little attention.

1) Multimethods and Spec both have a version of "global state" that persists across dereferences. I don't know how they're implemented so I can't speak to why, I just know that it works as I expect:
```
(defmulti mm :type)
(def partial-mm (partial mm))
(defmethod mm :foo [_] ::bar)

user=> (partial-mm {:type :foo})
:user/bar

(require '[clojure.spec.alpha :as s])
(def partial-valid? (partial s/valid? ::foo))
(s/def ::foo map?)

user=> (partial-valid? {})
true
```
Given that `extend` (and the other macros) don't require you to store the modified protocol, it's surprising that only one of the three "global state" doesn't work through dereferences.

2) Accepting that the global state behavior of protocols (through `extend` etc) is intentional, the protocol helper functions don't work on vars, meaning it's not possible to rely on the age-old `#'foo` trick:
```
(defprotocol CLJ-2094
  (execute [this bar]))

(defrecord Foo [])

(def partial-extends (partial extends? #'CLJ-2094))
(def partial-execute (partial #'execute (->Foo)))

user=> (partial-execute :bar)
true
user=> (partial-extends Foo)
Execution error (NullPointerException) at user/eval189 (REPL:1).
```
Here I'm passing a var of the protocol function to `partial`, which allows it to work as other functions work. This is expected behavior. However, passing a var to `extends?` (and the other protocol helper functions) throws NPEs because they won't deref a var.

Maybe the protocol helper functions could be modified to work on vars as well, which would give an avenue for such situations?
...