Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

Important Perf Note the new impl is faster for collections that are custom-reducible but not chunked, and is also faster for large numbers of bindings. The original implementation is hand tuned for chunked collections, and wins for larger chunked coll/smaller binding count scenarios, presumably due to the fn call/return tracking overhead of reduce. Details are in the comments.
Screened By
Patch doseq.patch

`
user=> (def a1 (range 10))

'user/a1

user=> (doseq [x1 a1 x2 a1 x3 a1 x4 a1 x5 a1 x6 a1 x7 a1 x8 a1] (do))
CompilerException java.lang.ClassFormatError: Invalid method Code length 69883 in class file user$eval1032, compiling:(NO_SOURCE_PATH:2:1)
`

While this example is silly, it's a problem we've hit a couple of times. It's pretty surprising when you have just a couple of lines of code and suddenly you get the code length error.

15 Answers

0 votes
by

Comment made by: hiredman

reproduces with jdk 1.8.0 and clojure 1.6

0 votes
by

Comment made by: bronsa

A potential fix for this is to make doseq generate intermediate fns like for does instead of expanding all the code directly.

0 votes
by

Comment made by: gshayban

Existing doseq handles chunked-traversal internally, deciding the
mechanics of traversal for a seq.  In addition to possibly conflating
concerns, this is causing a code explosion blowup when more bindings are
added, approx 240 bytes of bytecode per binding (without modifiers).

This approach redefs doseq later in core.clj, after protocol-based
reduce (and other modern conveniences like destructuring.)

It supports the existing :let, :while, and :when modifiers.

New is a stronger assertion that modifiers cannot come before binding
expressions.  (Same semantics as let, i.e. left to right)

valid:  (link: x coll :when (foo x))
invalid: (link: :when (foo x) x coll)

This implementation does not suffer from the code explosion problem.
About 25 bytes of bytecode + 1 fn per binding.

Implementing this without destructuring was not a party, luckily reduce
is defined later in core.
0 votes
by

Comment made by: jafingerhut

For anyone reviewing this patch, note that there are already many tests for correct functionality of doseq in file test/clojure/test_clojure/for.clj. It may not be immediately obvious, but every test for 'for' defined with deftest-both is a test for 'for' and also for 'doseq'.

Regarding the current implementation of doseq: it in't simply that it is too many bytes per binding, it is that the code size doubles with each additional binding. See these results, which measures size of the macroexpanded form rather than byte code size, but those two things should be fairly linearly related to each other here:

`
(defn formsize [form]
(count (with-out-str (print (macroexpand form)))))

user=> (formsize '(doseq [x (range 10)] (print x)))
652
user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y)))
1960
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z)))
4584
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w)))
9947
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p)))
20997
`

Here are results for the same expressions after Ghadi's patch doseq.patch dated June 25 2014:

user=> (formsize '(doseq [x (range 10)] (print x))) 93 user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y))) 170 user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z))) 247 user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w))) 324 user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p))) 401

It would be good to see some performance results with and without this patch, too.

0 votes
by

Comment made by: stu

In the tests below, the new impl is called "doseq2", vs. the original impl "doseq"

`
(def hund (into [] (range 100)))
(def ten (into [] (range 10)))
(def arr (int-array 100))
(def s "superduper")

;; big seq, few bindings: doseq2 LOSES
(dotimes [_ 5]
(time (doseq [a (range 100000000)])))
;; 1.2 sec

(dotimes [_ 5]
(time (doseq2 [a (range 100000000)])))
;; 1.8 sec

;; small unchunked reducible, few bindings: doseq2 wins
(dotimes [_ 5]
(time (doseq [a s b s c s])))
;; 0.5 sec

(dotimes [_ 5]
(time (doseq2 [a s b s c s])))
;; 0.2 sec

(dotimes [_ 5]
(time (doseq [a arr b arr c arr])))
;; 40 msec

(dotimes [_ 5]
(time (doseq2 [a arr b arr c arr])))
;; 8 msec

;; small chunked reducible, few bindings: doseq2 LOSES
(dotimes [_ 5]
(time (doseq [a hund b hund c hund])))
;; 2 msec

(dotimes [_ 5]
(time (doseq2 [a hund b hund c hund])))
;; 8 msec

;; more bindings: doseq2 wins bigger and bigger
(dotimes [_ 5]
(time (doseq [a ten b ten c ten d ten ])))
;; 2 msec

(dotimes [_ 5]
(time (doseq2 [a ten b ten c ten d ten ])))
;; 0.4 msec

(dotimes [_ 5]
(time (doseq [a ten b ten c ten d ten e ten])))
;; 18 msec

(dotimes [_ 5]
(time (doseq2 [a ten b ten c ten d ten e ten])))
;; 1 msec
`

0 votes
by

Comment made by: gshayban

Hmm, I cannot reproduce your results.

I'm not sure whether you are testing with lein, on what platform, what jvm opts.

Can we test using this little harness instead directly against clojure.jar? I've attached a the harness and two runs of results (one w/ default heap, the other 3GB w/ G1GC)

I added a medium and small (range) too.

Anecdotally, I see doseq2 outperform in all cases except the small range. Using criterium shows a wider performance gap favoring doseq2.

I pasted the results side by side for easier viewing.

`
core/doseq doseq2
"Elapsed time: 1610.865146 msecs" "Elapsed time: 2315.427573 msecs"
"Elapsed time: 2561.079069 msecs" "Elapsed time: 2232.479584 msecs"
"Elapsed time: 2446.674237 msecs" "Elapsed time: 2234.556301 msecs"
"Elapsed time: 2443.129809 msecs" "Elapsed time: 2224.302855 msecs"
"Elapsed time: 2456.406103 msecs" "Elapsed time: 2210.383112 msecs"

;; med range, few bindings:
core/doseq doseq2
"Elapsed time: 28.383197 msecs" "Elapsed time: 31.676448 msecs"
"Elapsed time: 13.908323 msecs" "Elapsed time: 11.136818 msecs"
"Elapsed time: 18.956345 msecs" "Elapsed time: 11.137122 msecs"
"Elapsed time: 12.367901 msecs" "Elapsed time: 11.049121 msecs"
"Elapsed time: 13.449006 msecs" "Elapsed time: 11.141385 msecs"

;; small range, few bindings:
core/doseq doseq2
"Elapsed time: 0.386334 msecs" "Elapsed time: 0.372388 msecs"
"Elapsed time: 0.10521 msecs" "Elapsed time: 0.203328 msecs"
"Elapsed time: 0.083378 msecs" "Elapsed time: 0.179116 msecs"
"Elapsed time: 0.097281 msecs" "Elapsed time: 0.150563 msecs"
"Elapsed time: 0.095649 msecs" "Elapsed time: 0.167609 msecs"

;; small unchunked reducible, few bindings:
core/doseq doseq2
"Elapsed time: 2.351466 msecs" "Elapsed time: 2.749858 msecs"
"Elapsed time: 0.755616 msecs" "Elapsed time: 0.80578 msecs"
"Elapsed time: 0.664072 msecs" "Elapsed time: 0.661074 msecs"
"Elapsed time: 0.549186 msecs" "Elapsed time: 0.712239 msecs"
"Elapsed time: 0.551442 msecs" "Elapsed time: 0.518207 msecs"

core/doseq doseq2
"Elapsed time: 95.237101 msecs" "Elapsed time: 55.3067 msecs"
"Elapsed time: 41.030972 msecs" "Elapsed time: 30.817747 msecs"
"Elapsed time: 42.107288 msecs" "Elapsed time: 19.535747 msecs"
"Elapsed time: 41.088291 msecs" "Elapsed time: 4.099174 msecs"
"Elapsed time: 41.03616 msecs" "Elapsed time: 4.084832 msecs"

;; small chunked reducible, few bindings:
core/doseq doseq2
"Elapsed time: 31.793603 msecs" "Elapsed time: 40.082492 msecs"
"Elapsed time: 17.302798 msecs" "Elapsed time: 28.286991 msecs"
"Elapsed time: 17.212189 msecs" "Elapsed time: 14.897374 msecs"
"Elapsed time: 17.266534 msecs" "Elapsed time: 10.248547 msecs"
"Elapsed time: 17.227381 msecs" "Elapsed time: 10.022326 msecs"

;; more bindings:
core/doseq doseq2
"Elapsed time: 4.418727 msecs" "Elapsed time: 2.685198 msecs"
"Elapsed time: 2.421063 msecs" "Elapsed time: 2.384134 msecs"
"Elapsed time: 2.210393 msecs" "Elapsed time: 2.341696 msecs"
"Elapsed time: 2.450744 msecs" "Elapsed time: 2.339638 msecs"
"Elapsed time: 2.223919 msecs" "Elapsed time: 2.372942 msecs"

core/doseq doseq2
"Elapsed time: 28.869393 msecs" "Elapsed time: 2.997713 msecs"
"Elapsed time: 22.414038 msecs" "Elapsed time: 1.807955 msecs"
"Elapsed time: 21.913959 msecs" "Elapsed time: 1.870567 msecs"
"Elapsed time: 22.357315 msecs" "Elapsed time: 1.904163 msecs"
"Elapsed time: 21.138915 msecs" "Elapsed time: 1.694175 msecs"
`

0 votes
by

Comment made by: gshayban

It's good that the benchmarks contain empty doseq bodies in order to isolate the overhead of traversal. However, that represents 0% of actual real-world code.

At least for the first benchmark (large chunked seq), adding in some tiny amount of work did not change results signifantly. Neither for (map str (link: a))

`
(range 10000000) => (map str [a])
core/doseq
"Elapsed time: 586.822389 msecs"
"Elapsed time: 563.640203 msecs"
"Elapsed time: 369.922975 msecs"
"Elapsed time: 366.164601 msecs"
"Elapsed time: 373.27327 msecs"
doseq2
"Elapsed time: 419.704021 msecs"
"Elapsed time: 371.065783 msecs"
"Elapsed time: 358.779231 msecs"
"Elapsed time: 363.874448 msecs"
"Elapsed time: 368.059586 msecs"

`

nor for intrisics like (inc a)

`

(range 10000000)
core/doseq
"Elapsed time: 317.091849 msecs"
"Elapsed time: 272.360988 msecs"
"Elapsed time: 215.501737 msecs"
"Elapsed time: 206.639181 msecs"
"Elapsed time: 206.883343 msecs"
doseq2
"Elapsed time: 241.475974 msecs"
"Elapsed time: 193.154832 msecs"
"Elapsed time: 198.757873 msecs"
"Elapsed time: 197.803042 msecs"
"Elapsed time: 200.603786 msecs"
`

I still see reduce-based doseq ahead of the original, except for small seqs

0 votes
by

Comment made by: gshayban

A form like the following will not work with this patch:

(go (doseq [c chs] (>! c :foo)))

as the go macro doesn't traverse fn boundaries. The only such code I know is core.async/mapcat**, a private fn supporting a fn that is marked deprecated.

0 votes
by

Comment made by: gshayban

I see #'clojure.core/run! was just added, which has a similar limitation

0 votes
by

Comment made by: richhickey

Please consider Ghadi's feedback, esp re: closures.

0 votes
by

Comment made by: gshayban

The current expansion of a doseq (link: 1) under a go form is less than ideal due to the amount of control flow. 14 states in the state machine vs. 7 with loop/recur

(link: 1) Comparison of macroexpansion of (go ... doseq) vs (go ... loop/recur)
https://gist.github.com/ghadishayban/639009900ce1933256a1

0 votes
by

Comment made by: bronsa

Related: CLJ-77

0 votes
by

Comment made by: bronsa

The general solution to this would be to automatically split methods when too large, using something like https://bitbucket.org/sperber/asm-method-size

0 votes
by

Comment made by: gshayban

Example doseq impl and macroexpansion that does not suffer exponential bytecode growth. It also doesn't use any lambdas, so suitable for core.async.
https://gist.github.com/ghadishayban/fe84eb8db78f23be68e20addf467c4d4
It uses an explicit stack for the seqs/bindings.
It does not handle any chunking or modifiers yet.

0 votes
by
Reference: https://clojure.atlassian.net/browse/CLJ-1322 (reported by arcatan)
...