Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

The stantdard pmap function does not allow to specify my own window size. It takes it as a number of CPUs available + 2:

(+ 2 (.. Runtime getRuntime availableProcessors))

I wonder why there is no a way to pass a custom N? To bypass it, I have a function like this:

(defn pmap+
  "
  Like pmap but accepts a custom size of a parallel
  window. Lazy. Takes only one collection of arguments
  at the moment.
  "
  [n func items]
  (lazy-seq
   (let [[head tail]
         (split-at n items)]
     (when (seq head)
       (let [futures
             (for [item head]
               (future (func item)))]
         (concat
          (->> futures
               doall
               (map deref))
          (pmap+ n func tail)))))))

This is helpful when dealing with third-party services via HTTP API. My question is, can we have a pmap with an optional N parameter, or maybe add a new function?

Thank you,
Ivan

1 Answer

0 votes
by

pmap+ won't saturate the whole CPU - it can potentially wait on the last future from a batch instead of letting the execution advance in a way where that hanging future becomes the first in a window. So effectively it's not using a parallel window but rather a chunk.

You're probably aware, but there's a library that has a better pmap: https://github.com/clj-commons/claypoole/blob/master/src/clj/com/climate/claypoole.clj#L406

Depending on what you need pmap+ for, another alternative is an input queue + a pool of workers + an output queue, e.g. via java.util.concurrent.Executors and ExecutorCompletionService.

Neither are lazy in a way clojure.core.pmap is, but for a reason. There have been plenty of discussions on Slack (haven't checked but it feels they happen around once every 2 months) describing why combining laziness with parallelism is not a good thing to do.

by
edited by
"Chunk" is the right word indeed, thank you. But still I cannot see how it answers the question. Why cannot I pass a custom chunk size? Sometimes, I need just a bit more than the number of CPUs. It could be an optional parameter. Plugging in an extra library is the last resort, I think.
by
Given that whenever pmap is mentioned on Slack, people tend to go "pmap is pretty much never the answer", my sense is that pmap was a bit of a mistake in Clojure core and, rather than add knobs and dials to it that might further encourage its use, the desired approach is to push people away from pmap and into using the underlying Java standard library and interop.
by
However, when `pmap` _is_ the right answer, it's such a good answer. I'd agree that in a server context, `pmap` might very often be the wrong answer, but for quick and dirty one-off scripts, it's very convenient, or dare I say, easy, in contrast to dropping down to the underlying primitives, which, being simple, are not that easy.
...