Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

Currently {{Reflector.getMethods}} performs expensive logic that includes {{java.lang.reflect.Method}} copying.
See: https://github.com/clojure/clojure/blob/b8607d5870202034679cda50ec390426827ff692/src/jvm/clojure/lang/Reflector.java#L373

In our application I see the following back-traces:

at Reflector.copyMethods at Reflector.invokeInstanceMethod at ...

These kind of backtraces are second top consumers of all the heap allocation.

JDK cannot cache {{Methods}} / {{Fields}} since they are mutable (e.g. user can call {{setAccessible}} here and there).
However, for the purposes of Clojure, I believe it should be fine to cache {{Methods}} and {{Fields}}.

What do you think?
E.g. WeakHashMap<Class, WeakReference<List<Method>>> or more sophisticated structure to account {{String name}}.

15 Answers

0 votes
by

Comment made by: alexmiller

If you are seeing Reflector as a hot spot in your application, you should probably turn on * } and use type hints to avoid reflection.

0 votes
by
_Comment made by: vladimirsitnikov_

Do you mean there is absolutely no reason to use reflection in Clojure ever?
I do understand that if developer gives enough type hints the reflection would go away.

However:
1) I just do not know if it is easily doable (in other words, if it is possible at all, maintainable, etc)
2) I'm not sure if "always use type hints" is considered a best practice. For instance, [warn-on-reflection|https://clojuredocs.org/clojure.core/*warn-on-reflection*] documentation page says nothing like "always use type hints"
3) Caching {{copyMethods}} seems to be a low-hanging fruit here, so it would shave cpu cycles for those who omitted type hints

PS. I'm a java performance engineer, not a Clojure engineer (as in "my Clojure knowledge is somewhere near {{(+ x y)}}"), so I kindly beg on your forgiveness for me not doing RTFM.
0 votes
by

Comment made by: alexmiller

No, I'm saying that if reflection is a hotspot in your application, usually it's worth investing a few minutes to add type hints in those hotspot areas and this is common advice for Clojure apps. Once that minimal work is done, few Clojure apps are bound by reflection.

Caching seems like an easy solution until you consider all of the management aspects. How does the cache get cleaned? Are the instances mutable and able to be reused? Are there cases where class loaders or code reloading create unexpected side effects? What are the concurrency effects of putting a shared resource in the invocation path? What is the memory impact of a cache and is it configurable?

Those are all things that would need to be investigated, meaning that this is not low-hanging fruit.

0 votes
by

Comment made by: mikera

Patch for simple caching of Reflector.getMethods calls for small arities

0 votes
by

Comment made by: mikera

I created a small patch to add very simple (fixed size, 1 element for each arity) caching for Reflector.getMethods calls. The aim is to keep this super simple to avoid issues like concurrency effects and having a variable-sized cache.

This helps a small amount in my tests (about 15-20%) on reflection calling the same method in a loop, which is probably the common case where people actually care about reflection performance.

Performance could certainly be improved further due to the fact that I think most of the overhead is actually is the invokeMatchingMethod, but that is an orthogonal issue. This patch opens the way for further performance optimisation in that area.

;; clojure 1.8.0-RC3
user=> (let (link: v (identity 1)) (time (dotimes (link: i 1000) (.doubleValue v))))
"Elapsed time: 1.598779 msecs"

;; with cached arities
user=> (let (link: v (identity 1)) (time (dotimes (link: i 1000) (.doubleValue v))))
"Elapsed time: 1.359888 msecs"

0 votes
by

Comment made by: vladimirsitnikov

(link: ~mikera), I wonder if your patch results in a performance regression for concurrent workload.

You've created a single point of contention as lots of threads will try to update {{private static InstanceMethodCache[] instanceMethodCache}} entries, so it will hit both "true sharing" and "false sharing" problems.

Should {{instanceMethodCache}} be final and written in capital letters?

0 votes
by

Comment made by: alexmiller

This ticket needs a better problem definition. That is: "I am doing ____" (with an example) that shows Reflector.getMethods() as the bottleneck.

If I guess at what the problem is, I remain unconvinced that this is the best solution.

A ThreadLocal is likely the cache solution with the lowest concurrency impact.

0 votes
by

Comment made by: mikera

This shouldn't have any noticeable concurrency impact: no locking is required for this very simple approach. Most of the time it is simply an unlocked read from an array on the heap, the Java memory model is enough to guarantee correct behaviour. That's cheaper than even a threadlocal, e.g. there's some evidence here that this is 10-20x faster: http://stackoverflow.com/questions/609826/performance-of-threadlocal-variable

At the very least, any concurrency impact is so tiny it will be dwarfed by the benefits of often avoiding the getMethods calls, which are expensive. The cost of array access is a few nanoseconds compared to the cost of getMethods which appears from the benchmark above to be a few hundred nanoseconds.

The worst concurrency case I can think of is the case where two different threads are calling getMethods on different methods at a high rate and these calls are perfectly interleaved so that they always invalidate the cache. But even in that case, it's probably not measurably worse than the current code.

@Vladimir yes, insntanceMethodCache could be final. Might help the JVM very marginally, I guess.

@Alex, I proposed this patch because it is an improvement over what is currently there, I certainly don't think it will be the "best possible solution". In the spirit of open source and making incremental progress, I'd like you to consider accepting it, even if this issue stays open for future consideration. This is also linked to clj-1866, I'm trying to make the "fast path" for reflection better in a few different ways. If you'd rather have a single large patch with a whole bunch of improvements I can certainly do that, I has under the impression that smaller, more "obvious" patches would be easier for you to review but happy either way.

0 votes
by

Comment made by: vladimirsitnikov

(link: ~mikera), you are missing the point.

Please check here: http://shipilev.net/blog/2014/jmm-pragmatics/#_benchmarks, slide 77/100 "SC-DRF: Writes"

{quote}Alexey Shipilev: This reinforces the idea that data sharing is what you should avoid in the first place, not volatiles{quote}

Having {{ThreadLocal}} cache would eliminate "shared update" problem.

{quote}This ticket needs a better problem definition. That is: "I am doing ____" (with an example) that shows Reflector.getMethods() as the bottleneck{quote}
That is true.
My particular case was somehow resolved by development team.
I just thought some basic cache would make Clojure do the right thing by default and require less type specialization written manually.

0 votes
by

Comment made by: alexmiller

I see a lot of "should" type statements in there. The whole point is that no change like this is going to go in until we know that there aren't impacts. But more importantly I'm not even going to mark it triaged until it's a good ticket that starts with a problem statement.

0 votes
by

Comment made by: mikera

Alex, what do you mean by "know there aren't impacts"? That seems an absurd position on the face of it, it is a perfectly acceptable trade-off to allow minor regressions in rare corner cases if you are improving the fast path / common case significantly.

Also, this definitely isn't a standard that is universally applied for changes to Clojure. Plenty of changes go into Clojure which cause performance regressions in other areas, you only need to look at Andy Fingerhut's excellent benchmarking efforts here to see that: https://jafingerhut.github.io/clojure-benchmarks-results/Clojure-expression-benchmark-graphs.html

Problem statement is IMHO obvious for anything performance related: "Performance of X is sub-optimal, which hurts users who are doing X." If you want a new ticket / changed description that says something like that then I'm happy to do it, but that seriously just feels like bureaucratic box ticking. Please consider this as constructive feedback on your contribution process.

What exactly (i.e. which benchmarks) do you need to see as a valid demonstration of improvement in performance-related issues like this?

0 votes
by

Comment made by: alexmiller

Similar to my comments in CLJ-1866, the title of this ticket is "Reflector.getMethods should be cached". That is again a solution, not a problem. What I'm looking for is a title like "Repeated reflection in a loop is slow" and a description that starts with some example code demonstrating the problem. Without a good problem statement, I cannot triage this ticket. I may still consider the priority of the problem to be low enough that it's not worth triaging at this time - I'll withhold judgement though until the ticket is improved.

The fact that prior changes have had unexpected performance impacts only lends additional credence to my suggestion that this (performance-oriented) ticket should validate its claims. You have added code, which makes the "miss" path of this code slower than it was before. How much slower? It should make the "hit" path faster - how much faster? In typical code, how often do we encounter hit vs miss paths? My presumption is that the example will demonstrate a case where the hit path is common. These are the kinds of questions I, as a screener, must ask to evaluate any proposed solution.

Additionally, you are introducing concurrency concerns and some additional work is required to verify both correctness (the current patch has visibility issues) and that you have not introduced contention or memory issues. These are typical problems for any caching-related optimization and I could point you to any number of prior tickets that have wrestled with them.

0 votes
by

Comment made by: mikera

Thanks Alex for explaining your concerns.

I agree that a problem-oriented approach to patches is better, so I suggest the following:
- We close both this issue and clj-1866
- I create a separate problem-focused ticket for reflection performance
- I'll benchmark the changes as whole for a number of different cases
- You'll triage the patch on the assumption that we can demonstrate noticeable improvement in the common cases, all tests pass as before, and no major regressions occur in the corner cases (concurrent access, frequent cache misses etc.)

If you want a problem-oriented issue then I don't think it makes much sense to create separate tickets / patches for each "solution" (although some OSS projects choose to do it that way, they usually have have a much more streamlined process for minor changes / optimisations which probably doesn't suit the Clojure dev process)

Agreed?

0 votes
by

Comment made by: alexmiller

As I said, we would prefer small focused tickets and patches, rather than one big patch.

I will reiterate that I'm not convinced doing any of this work makes sense if the scenario is one where a type hint would solve the problem.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1784 (reported by alex+import)
...