Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Transducers by

Hi,

I'm learning more about transducers, but coming up with some deadends.

What I'm looking to understand, is whether they are appropriate to use (in my use-case, but also as a learning exercise) and whether what I am doing is right/efficient/maybe-there-is-a-better-way :-)

Let' say I have a data stucture thus:

 (def trip {:tripData 
            {:segments [{:dataPoints 
                         [{:location {:lat 1 :lng 2}} 
                          {:location {:lat 3 :lng 4}}]}]}})

There could be hundreds/thousands of dataPoints with each dataPoint having a single location.

I want to efficiently extract out only the lat and lng into a single collection and turn it into a string. I came up with this:

 (def xf
   (comp
    (mapcat :dataPoints)
    (map :location)
    (map (fn [{lat :lat lng :lng}] (str lat " " lng)))))

then evaluated via:

(def lat-lng (into [] xf (->> trip :tripData :segments)))

And I get back something like this:

["1 2" "3 4"]

Which I can then do (for the purposes of my exercise), this:

 (clojure.string/join ", " lat-lng)

to obtain, finally, this:

"1 2, 3 4"

Which is all fine and dandy :-)

However, given my inexperience with transducers, I'm left wondering if there is a different/better way. For example, a way, within the comp xf, to turn the data into the string and joined at the end instead of using clojure.string/join.

I also discovered, I can do this too, without the use of transducers:

 (def lat-lng-2 (->> trip
                     :tripData
                     :segments
                     (mapcat :dataPoints)
                     (map :location)
                     (map (fn [{lat :lat lng :lng}] (str lat " " lng)))))

Which, given the clojure.string/join, ends up with the same result.

However, it's my understanding that you can't use a map keyword (i.e., :tripData, :segments) as part of a comp as keywords are not transducers.

I'm at a loss on how to make this efficent/better whilst learning how I can use transducers.

I would appreciate some help/guidance/feedback!

Thank you.

3 Answers

+2 votes
by

Expanding a bit on Tom's answer about doing it better with string builder reducer, here is how it can be done cleaner:

(def xf
  (comp
   (mapcat :dataPoints)
   (map :location)
   (map (fn [{:keys [lat lng]}] (str lat " " lng)))
   (interpose ", ")))

(defn string-builder-rf
  ([] (StringBuilder.))
  ([^StringBuilder ret] (.toString ret))
  ([^StringBuilder acc in]
   (.append acc in)))

(transduce xf string-builder-rf (-> trip :tripData :segments))
; => "1 2, 3 4"

2 main changes:
- using (interpose ", ") transducer to mimic separator behavior of clojure.string/join. If you use resulting transducer with into, it will return this: ["1 2" ", " "3 4"]. now all is left to do is efficiently build a string from it.
- using proper string builder reducing function, providing all 3 arities because all of them are needed: 0-arity to create a string builder, 2-arity for actual reducing step that will be called with "1 2", ", " and "3 4" consecutively, and completing 1-arity to transform string builder into a string

That's going to be more efficient since you avoid creating intermediate vector that you pass to clojure.string/join

by
edited by
Thank you! I like the ability to create my own transducer there and the rationale for doing so! Thank you very much! :-)

Re-reading the signature of the transducer - pretty amazing stuff! Clear!
+1 vote
by
edited by

In the second pipeline example, using ->>, you are doing a similar operation as the into version. Unpacking the trip by getting the association with :tripData, then that map's association with :segments. The difference is that in the into version, you're supplying a similar-looking pipeline (described via compfor reasons not detailed here) into effectively a fancy loop that traverses the thing you're reducing (the :segments vector), and transforms the elements accordingly inside the loop prior to applying an implicit reducing function (here it's likely conj, or really conj! since into will use a transient vector to build up more efficiently). You can think of it as all those functions in comp being applied (in a sense) per element during the reduction. Compare this with the second example:

In the second one based on lazy sequences, you piped the :segments vector into a map that constructed a lazy sequence based on mapping and concatenating the application of :dataPoints over the input. Then that was mapd again (also producing another lazy sequence, depending traversing the output of mapcat), and again. So, there's actually a sort of stack of 3 dependent lazy sequences there. To traverse the output of the pipeline, I have to traverse the last lazy seq, which then traverses its dependent predecessor, which traverses the mapcat. There's a sort of fire drill (in the old days, when they used to form bucket brigades to manually get water from a source to a fire, you'd have to shuffle buckets down a line of people, with the last person dumping it on the fire). There's some overhead in the sequence based version, since each sequence has to allocate some intermediate objects, thunks, and force evaluation as elements are accessed. This isn't dire, but there is overhead, and that overhead grows the more sequences you stack on.

We don't have that overhead with the into variant because we never create the intermediate lazy sequences. Instead, we just apply a bunch of functions (rather than creating, forcing, and caching the elements of multiple dependent sequences). Eliminating this overhead can produce significant savings.

Note: you can eliminate the need to build up a vector (as into does) if you use transduce. You can also mix and match sequences and transducers via sequence and eduction:

(sequence (eduction (map inc) (range 10)))

Transducers are cool both for efficiency, but also since they're pretty general and fit in nicely with sequences and core.async channels. There's a lot of utility there.

However, it's my understanding that you can't use a map keyword (i.e., :tripData, :segments) as part of a comp as keywords are not transducers.

You can (as demonstrated) map the keyword, ala

(map :blah)

which yields a transducer.

I'm at a loss on how to make this efficent/better whilst learning how I can use transducers.

You might see if using a version of transduce where you build the string up ala clojure.string/join is faster than creating the vector and sending it to clojure.string/join.
Ideally, you'd define a reducing function...

(let [res (->> trip
               :tripData
               :segments
               (transduce xf (completing (fn [^java.lang.StringBuilder acc x]
                                           (doto acc (.append sep) (.append (str x)))))
                          (java.lang.StringBuilder.)))]
  ;;get rid of the first comma
  (doto res (.deleteCharAt 0) str))

This should replicate what clojure.string/join is doing internally with its string builder, without creating any intermediate collection (like a vector).

by
Thank you Tom, lots of very useful information there!
+1 vote
by

In xforms (https://github.com/cgrand/xforms) there is a x/str transducing context to directly build a string from a transduction:

user=> (x/str 
         (comp
           (x/for [{:keys [dataPoints]} %
                   {{:keys [lat lng]} :location} dataPoints]
             (str lat " " lng))
           (interpose ", "))
         (-> trip :tripData :segments))
"1 2, 3 4"
by
Hi!

Thank you for that! Very interesting!
...