Welcome! Please see the About page for a little more info on how this works.

0 votes
in Clojure by

My code is spending most of its time scoring bisections: determining how many edges of a graph cross from one set of nodes to the other.

Assume bisect is a set of half of a graph's nodes (ints), and edges is a list of (directed) edges [ [n1 n2] ...] where n1,n2 are also nodes.

(defn tstBisectScore
  "number of edges crossing bisect"
  ([bisect edges]
   (tstBisectScore bisect 0 edges))

  ([bisect nx edge2check]
   (if (empty? edge2check)
     nx

     (let [[n1 n2] (first edge2check)
           inb1 (contains? bisect n1)
           inb2 (contains? bisect n2)]
       (if (or (and inb1 inb2)
               (and (not inb1) (not inb2)))
         (recur bisect nx (rest edge2check))
         (recur bisect (inc nx) (rest edge2check))))

     )))

The only clues I have via sampling the execution of this code (using VisualVM) shows most of the time spent in clojure.core$empty_QMARK_, and most of the rest in clojure.core$contains_QMARK_. (first and rest take only a small fraction of the time.)

Any suggestions as to how I could tighten the code?

1 Answer

0 votes
by

(empty? x) is just (not (seq x)), and that seq can potentially be costly. If you know for sure that edge2check is always, for example, a vector, then you can replace that (empty? edge2check) with (zero? (count edge2check)). Or an even lower-level (.isEmpty ^java.util.List edge2check). But given that you call rest on it, it stops being a vector after recur, so you'd have to use subvec instead of rest. Or, alternatively, invert the order of the elements in that vector and call peek and pop instead of first and rest, respectively.

Similar thing with contains? - you can replace (contains? bisect x) with (.contains ^clojure.lang.IPersistentSet bisect x) if you always know for sure that bisect is a persistent set.

by
edited by
thanks very much @Eugene, these seem like excellent suggestions.  i implemented both your peek/pop and IPersistentSet ideas:

   (defn tstBisectScore2
     ([bisect edges]
      (tstBisectScore2 bisect 0 (vec edges)))
   
     ([bisect nx edge2check]
      (if (zero? (count edge2check))
        nx
   
        (let [[n1 n2] (peek edge2check)
              inb1 (.contains ^clojure.lang.IPersistentSet bisect n1)
              inb2 (.contains ^clojure.lang.IPersistentSet bisect n2)]
          (if (or (and inb1 inb2)
                  (and (not inb1) (not inb2)))
            (recur bisect nx       (pop edge2check))
            (recur bisect (inc nx) (pop edge2check))))
   
        )))

but timing old vs. new shows only slight (6%) improvement?

   (time (dotimes [i 100000]
           (hpclj.test/tstBisectScore1 rp g6fe)))
   "Elapsed time: 6103.62275 msecs"

   (time (dotimes [i 100000]
           (hpclj.test/tstBisectScore2 rp g6fe)))
   "Elapsed time: 5732.2645 msecs"

i know timing clojure has fine-points (startup?), so maybe this simple experiment is insufficient?  thanks again in any case.
by
For benchmarks, it's better to use something like Criterium. But that beside, that call to `vec` is probably what's taking time now. What I meant in my original answer, is that `g6fe` should be a vector in the first place, if possible.
by
no, the `g6fe` comes as list of edges (from the loom graph package).  i've not played with Criterium before but will try that.
by
curiouser and curiouser!  criterium makes your ideas seem slightly (3%) SLOWER:

   (criterium/quick-bench (tstBisectScore1 rp g6fe))
   
   Evaluation count : 11148 in 6 samples of 1858 calls.
                Execution time mean : 54.087476 µs
       Execution time std-deviation : 387.196475 ns
      Execution time lower quantile : 53.651538 µs ( 2.5%)
      Execution time upper quantile : 54.485682 µs (97.5%)
                      Overhead used : 1.942216 ns
   
   (criterium/quick-bench (tstBisectScore2 rp g6fe))
   
   Evaluation count : 11262 in 6 samples of 1877 calls.
                Execution time mean : 55.667037 µs
       Execution time std-deviation : 1.035787 µs
      Execution time lower quantile : 54.642948 µs ( 2.5%)
      Execution time upper quantile : 57.269671 µs (97.5%)
                      Overhead used : 1.942216 ns
by
As I said, it's that `vec`. You traverse over a collection of N elements not once but twice because of it. I've seen your question on SO and one of the answers suggested replacing `rest` with `next` and then checking that the coll is not nil - I think that's what you should do instead of using `vec` if you can't create a vector in the first place at all.
by
sorry for the cross-posting(https://stackoverflow.com/questions/73847837/can-i-make-this-clojure-code-scoring-a-graph-bisection-more-efficient in case anyone stumbles across this) i've experimented with the solution over there, and indeed the next vs rest solution seems best.
by
Just a note that `empty?` has changed in 1.12.0-alpha1 to leverage `counted` colls and thus avoiding seq'ing a coll needlessly, so that might actually help the original code.
...