Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in core.async by
Hi all, bit of a tricky bug to report here... We're seeing some problems with using core.async in ClojureScript on Safari 7. Our application is built around a large event loop that blocks on a message from one of many channels that correspond to user activity or API calls. The problem seems to lie within this event loop - we are using alts! to pull a message out of any available channel, but sometimes logging shows that we reach alts! and never unblock. However, with a little more logging, I can see that there are subsequent writes to one of the channels in the list of channels passed to alts!, so I'm not really sure what's going on.

That's the high level overview, now on to some code.

Our main event loop is as follows:


  (log "Entering main event loop.")
  (go
    (while true
      (log "alts! channel hashes: " (map hash (:channels @app)))
      (let [[message channel] (alts! (seq (:channels @app)))]
        (log "alts! unblocked, calling our process-message")
        (swap! app process-message message channel)
        (log "process-message completed, looping"))))


{{process-message}} here is our a function internal to our application, but I don't think it's details are necessarily important. In the scenario where Safari gets stuck, the log looks like:


[Log] process-message completed, looping (main.js, line 62)
[Log] alts! channel hashes:  (16 12 19 33) (main.js, line 82)
[Log] Socket connected. (socket.js, line 309)
[Log] put! to channel with hash  19 (socket.js, line 86)
[Log] The message is [:metronome [:staff [{:description nil, :deletable true, :email nil, :isAdmin true, :isTrainer false, :telephone nil, :name "Fynder Admin", :picture nil, :userId 1} {:description nil, :deletable fa... (socket.js, line 87)
[Log] put! callback gave us true (socket.js, line 89)
[Debug] Metronome: staff data decoded. put! complete.: 12.282ms (socket.js, line 93)
[Log] put! to channel with hash  19 (socket.js, line 86)
[Log] The message is [:metronome [:class-types [{:deletable false, :picture nil, :name "CycleCore", :id 2, :description "CycleCore is a 55-minute dual workout concept that combines 30 minutes of intense cardiovascular ... (socket.js, line 87)
[Log] put! callback gave us true (socket.js, line 89)
[Debug] Metronome: class-types data decoded. put! complete.: 1.288ms (socket.js, line 93)
[Log] put! to channel with hash  19 (socket.js, line 86)
[Log] The message is [:metronome [:locations [{:studios [{:deletable false, :name "Kensington", :id 1, :locationId 1, :description "Studio (11a) sits just off Stratford Road in Stratford Studios. To find us, just pass ... (socket.js, line 87)
[Debug] Metronome: locations data decoded. put! complete.: 0.884ms (socket.js, line 93)


Note that we see a log entry for "alts! channel hashes", but we never seen "alts! unblocked". However, note the list of hashes passed to alts!. Channel 19 is mentioned, but subsequently we put! to channel 19... yet we still don't get unblocked. Something that also strikes me as suspicious, is that while we're blocked at alts!, two calls to put! have succeeded *immediately*, for a channel that is bounded to contain only one element at a time. Maybe I'm misunderstanding something, but I wouldn't expect the immediate-put callback to be invoked more than once. Interestingly that last put! doesn't invoke the callback.

Unfortunately, reproduction of this bug is reasonably difficult. I can somewhat reliably reproduce it by quitting Safari, re-opening it, and navigating to the dev server. About 1 in 15 attempts get stuck in this fashion. I wondered if it was something to do with Safari's MessageChannel implementation - you can see in the log entries where nexttick.js calls its callback, which seems to be how dispatch is working in my browser.

I'd be very happy to help provide any more information that's useful, but this problem is now outside my ability to debug it. While the code is proprietary, I'd be happy to temporarily add people to the Github project in order to try and get this fixed. We have development APIs servers that you can point at, so it should be just a case of running {{lein cljs}}.

I've attached our code for our Socket.io wrapper and our main event loop. Sadly I do not yet have a minimal test-case - I wouldn't really know where to begin.

6 Answers

0 votes
by

Comment made by: ocharles

I went deep into the guts of the Google Closure library and changed {{getSetImmediateEmulator_}} to:

`
goog.async.nextTick.getSetImmediateEmulator_ = function() {
// Fall back to setTimeout with 0. In browsers this creates a delay of 5ms
// or more.
return function(cb) {

goog.global.setTimeout(cb, 0);

};
};
`

and I haven't been able to get it stuck. So maybe MessageChannel has problems in Safari...

0 votes
by

Comment made by: gshayban

Hi Oliver, seems like a race, and we'll figure this out.

Would you mind compare running upon 0.1.319.0-6b1aca-alpha vs 0.1.346.0-17112a-alpha ?

alts! should be passed an indexed collection/vector btw. Passing a seq wouldn't cause this bug, just something to note.

0 votes
by

Comment made by: ocharles

Hi Ghadi,

0.1.319.0-6b1aca-alpha is what the initial report was against - I should have mentioned that. So 0.1.319.0-6b1aca-alpha does get stuck.

0.1.346.0-17112a-alpha however does not get stuck, which is odd - as I'm sure I tried upgrading to this! I've tried on two Macs that are normally problematic, and they didn't get stuck once. I'm pushing this out to more of our testers and will see what happens.

0 votes
by

Comment made by: ocharles

Aha, I knew it wouldn't be that easy! Upon releasing this to production, it immediately froze again. The dev server runs with very different optimisations though, so I'm going to build a production release and serve that locally - will see what happens there.

0 votes
by
_Comment made by: ocharles_

Yep, definitely a problem with optimisations. Here is my Shadow Build configuration


(ns fynder.shadowbuild
  (:require [clojure.java.io :as io]
            [shadow.cljs.build :as cljs]))

(defn define-modules [state]
  (-> state
      (cljs/step-configure-module :cljs '[cljs.core clojure.walk clojure.string cljs.reader cljs.core.async] #{})
      (cljs/step-configure-module :test-support '[inflections.core no.en.core enfocus.bind fynder.winchan] #{:cljs})
      (cljs/step-configure-module :devel '[fynder.devel] #{:cljs})
      (cljs/step-configure-module :admin '[fynder-admin.main] #{:cljs})
      (cljs/step-configure-module :trainer '[fynder-trainer.main] #{:cljs})
      (cljs/step-configure-module :mobile '[fynder-mobile.main] #{:cljs})
      (cljs/step-configure-module :sweatybetty '[fynder-sweatybetty.main] #{:cljs})
      (cljs/step-configure-module :loader '[fynder-loader.loader] #{:cljs})))

(defn dev
  "build the project, wait for file changes, repeat"
  [& args]
  (let [state (-> (cljs/init-state)
                  (cljs/enable-source-maps)
                  (assoc :optimizations :advanced
                         :pretty-print false
                         :work-dir (io/file "target/cljs-work")
                         :public-dir (io/file "resources/dev")
                         :public-path "")
                  (cljs/step-find-resources-in-jars)
                  (cljs/step-find-resources "src/cljs/")
                  (cljs/step-finalize-config)
                  (cljs/step-compile-core)
                  (define-modules))]
    ;; compile, flush, reload, repeat
    (loop [state state]
      (let [new-state (try
                        (-> state
                            (cljs/step-compile-modules)
                            (cljs/flush-unoptimized)
                            (cljs/wait-and-reload!))
                        (catch Throwable t
                          (prn [:failed-to-compile t])
                          (.printStackTrace t)
                          (cljs/wait-and-reload! state)))]
        (recur new-state)))))

(defn prod
  "build the project, wait for file changes, repeat"
  [& args]
  (-> (cljs/init-state)
      (cljs/enable-emit-constants)
      (cljs/enable-source-maps)
      (assoc :optimizations :advanced
             :pretty-print false
             :work-dir (io/file "target/cljs-work")
             :public-dir (io/file "resources/prod")
             :externs ["react/externs/react.js"
                       "externs/base32.js"
                       "externs/jquery.js"
                       "externs/react_addons.js"
                       "externs/fastclick.js"
                       "externs/socket.io.js"
                       "externs/moment.js"
                       "externs/papa.js"
                       "externs/markdown.js"
                       "externs/xss.js"
                       "externs/facebook.js"
                       "externs/checkout.js"])
      (cljs/step-find-resources-in-jars)
      (cljs/step-find-resources "src/cljs/")
      (cljs/step-finalize-config)
      (cljs/step-compile-core)
      (define-modules)
      (cljs/step-compile-modules)
      (cljs/closure-optimize)
      (cljs/flush-to-disk)
      (cljs/flush-modules-to-disk)))


If I run in the dev profile, then I can't get it stuck. If I switch over to the production profile (and serve the result of lein publish with python's SimpleHTTPServer), then Safari does get stuck.
0 votes
by
Reference: https://clojure.atlassian.net/browse/ASYNC-97 (reported by alex+import)
...