How do sequences cache?

Question

How do sequences cache?

asked Mar 24, 2020 in Clojure by John

I have read that Clojure sequences "cache" their values as they are realized (here and here for example). This often seems to come up when discussing the relative merits of using transducers for transforming values rather than chaining sequence functions, as the latter approach creates "intermediate cached sequences".

However, I have also come across the advice "don’t hold on to your head" when dealing with sequences, because holding a reference to the beginning of the sequence prevents GC from freeing up memory used to hold previously seen values in the sequence.

At my current level of understanding these ideas appear to conflict. If sequences cache their results, then why does it matter whether you hold the head? Does caching not imply that all the realised elements are held onto anyway? Or looked at the other way, why would sequences cache their realised values if the GC is free to clean them up?

I know I must be missing something obvious here, but I would really appreciate an explanation.

1 Answer

answered Mar 24, 2020 by alexmiller
selected Mar 24, 2020 by John

Best answer

A realized sequence is a chain of objects in memory, with each link in the chain pointing to a value and to the next link until you eventually get to a function object which is the remaining unrealized portion of the sequence. References to any link in the realized chain cause the rest of the chain from that point forward to be strongly held and unable to be gc'ed.

If you are only pointing to the unrealized "end" of the chain, then all of the links "behind" you can be collected. When you "hold the head", that pointer is to the beginning of the chain, which prevents the N links from beginning to the unrealized point to be held in memory.

For example:

(def r (repeat 100000000 "abcdef"))   ;; r holds a strong reference to the head
(count r)

vs

(count (repeat 100000000 "abcdef"))

which can walk the seq, allowing the links behind the counter to be gc'ed. (Note that range is a typical thing people use for these kinds of examples, but it has an optimized count for the typical cases.)

commented Mar 24, 2020 by John

commented Mar 24, 2020 by Andy Fingerhut

commented Mar 24, 2020 by alexmiller

commented Mar 24, 2020 by John

commented Mar 24, 2020 by alexmiller

How do sequences cache?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Categories

How do sequences cache?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories