We used seque
to manage a large database migration and observed a possible memory leak that required us to restart it several times.
We iterated through hundreds of queries to extract data via reduce, each using its own seque
but eventually ran out of memory.
I believe it could have been related to seque
and its use of agents, which was predicted to leak memory in CLJ-1125. We have not since tried a migration without seque
for comparison, but I instead turned my attention to seque
and found possibly-related problems.
seque uses an agent to offer items to its buffer. Agents have a memory leak where the conveyed bindings of a send
are held by the executing Thread (the most relevant being *agent*
). This means even if the seque
is gc'ed, the agent persists if the thread is part of a cached thread pool, usually containing realized items from the producing seq (such as the first item to fail to be offered to the buffer).
I believe this demonstrates seque
leaking memory (Clojure 1.12.0):
(let [pool-size 500]
(defn expand-thread-pool! []
(let [p (promise)]
(mapv deref (mapv #(future (if (= (dec pool-size) %) (deliver p true) @p)) (range pool-size))))))
(let [_ (expand-thread-pool!) ;; increases likelihood of observing leak
ready (promise)
strong-ref (volatile! (Object.))
weak-ref (java.lang.ref.WeakReference. @strong-ref)
the-seque (volatile! (seque 1 (lazy-seq
(let [s (repeat @strong-ref)]
(deliver ready true)
s))))]
@ready
(vreset! strong-ref nil)
(vreset! the-seque nil)
(System/gc)
(doseq [i (range 10)
:while (some? (.get weak-ref))]
(prn "waiting for gc...")
(Thread/sleep 1000)
(System/gc))
(prn (if (nil? (.get weak-ref))
"garbage collection successful"
"seque memory leak!!")))
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"seque memory leak!!"
This can be reproduced with agents. Once an agent has executed an action, a strong reference persists to the agent via *agent*
in the cached thread it was executed in. Here we observe the agent is not garbage collected if it has executed an action on a cached thread pool (Clojure 1.12.0):
(let [_ (expand-thread-pool!)
strong-ref (volatile! (agent nil))
weak-ref (java.lang.ref.WeakReference. @strong-ref)]
;#_#_ ;;uncomment this and the agent is freed
(send-off @strong-ref vector)
(doseq [i (range 10)
:while (not (vector? @@strong-ref))]
(Thread/sleep 1000))
(vreset! strong-ref nil)
(System/gc)
(doseq [i (range 10)
:while (some? (.get weak-ref))]
(prn "waiting for gc...")
(Thread/sleep 1000)
(System/gc))
(prn (if (nil? (.get weak-ref))
"garbage collection successful"
"agent memory leak!!")))
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"waiting for gc..."
;"agent memory leak!!"