Welcome! Please see the About page for a little more info on how this works.

0 votes
ago in Transducers by

Could you tell me where I can find information understanding the difference in behavior below?

user=> (clojure-version)
"1.12.5"
user=> (def iter1 (eduction (filter even?) (range)))
#'user/iter1
user=> (def iter2 (eduction (filter zero?) (range)))
#'user/iter2
user=> (take 1 iter1)
(0)
user=> (take 1 iter2)
;;; Pressed Ctrl-C due to no response

This behavior occurs when the input source is infinite, but I'm confused because I think filter is stateless. Of course, it works fine when Transducers aren't used.

user=> (take 1 (filter even? (range)))
(0)
user=> (take 1 (filter zero? (range)))
(0)

1 Answer

+1 vote
ago by
selected ago by
 
Best answer

take converts its collection argument into a seq, and making a seq out of an eduction is done via the java.lang.Iterable route. That route moves the data around in chunks of size 32. So (take ... (eduction ...)) will try to fetch 32 elements regardless of the number passed to take.

So, it's not about transducers, it's about how eduction and converting it into a seq work.

You'll see the same behavior with sequence - but that's because it creates a chunked sequence internally.

ago by
Thank you for your reply. I now understand that chunk reading is applied when reading from `education/sequence`. I believed it was applied when reading from the input source.

I will be careful when dealing with infinite input sources.

This is a bit off-topic. It seems the current implementation won’t work chunked reading unless it can extract at least 33, not 32,  target elements from an infinite input source.

    user=> (def src3 (concat (repeat 31 0) (repeat 0)))
    #'user/src3
    user=> (def src4 (concat (repeat 31 0) (mapcat #(vector % %) (range))))
    #'user/src4
    user=> (def src5 (concat (repeat 31 0) (range)))
    #'user/src5
    user=> (take 32 src3)
    (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
    user=> (take 32 src4)
    (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
    user=> (take 32 src5)
    (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
    user=>
    user=> (take 1 (eduction (filter zero?) src3))
    (0)
    user=> (take 1 (eduction (filter zero?) src4))
    (0)
    user=> (take 1 (eduction (filter zero?) src5))
    ;;; ---> out of memory
ago by
Good observation. That's probably because of this line: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L542. If the conditions were swapped, it would work, I think.
ago by
Ahhh, I see. :-)
ago by
edited ago by
I've given it some thought.

It looks like this behavior occurs when attempting to read the last chunk of the target element generated from an infinite input source. So,  I feel simply swapping the conditions probably won’t solve the problem.

    user=> (def src1 (concat (repeat 0 64) (repeat 1)))
    #'user/src1
    user=> (take 40 (sequence (filter zero?) src1)
    ;;; ---> out of memory

As a workaround for safety using sequence/eduction, I feel that we should consider avoiding the use of infinite input sources or, if possible, using the `take` function within the xform to convert the input into a finite amount.

Given the design of chunked reading, perhaps that’s just the way it is.
...