Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Collections by

This is a question about intersection of 3 different clojure aspects:

  1. Collection functions treat nil as empty collection. Functions like filter, map, and even assoc happily accept nil for coll.
  2. if and derived macros (when, and, or etc.) treat nil as falsey value.
  3. empty? coerces it's input to seq, which leads to unnecessary allocations, and it's idiomatic to check for not-emptyness using seq instead of (not (empty? xs)). This is a common source of confusion, because (not (empty? xs))'s intent feels more clear then seq

With that said, I feel like consciously representing every empty collection in application as nil might be useful, because this:

(let [xs (get-non-empty-coll-or-nil)]
  (if xs
    (do-stuff-with xs)
    (do-nothing)))

...is more performant and clear than this:

(let [xs (get-possibly-empty-coll)]
  (if (seq xs)
    (do-stuff-with xs)
    (do-nothing)))

Two downsides I see to this approach:

  1. having to use (fnil conj []) or (fnil conj #{}) instead of conj to ensure collections are vectors/sets, because conj-ing to nil creates a list, which I personally almost never use.
  2. having to run all incoming collections on the boundaries of a system through not-empty.

What do you think?

1 Answer

+3 votes
by

In my opinion, you are overthinking it. I would write my functions in readable and simple ways, and if those end up returning nil or empty seq in the case of an empty coll, so be it. Unless I know the function will be used in a specific context which really needs it to be nil or empty seq, I wouldn't go out of my way to make it one or the other. It would be whichever naturally falls out of the simplest and clearest implementation I found.

The performance perspective is similar. This is a micro-optimization, and to enforce it as you mentioned, you might need to pollute the readability and simplicity of the code such as when you'd now need to deal with conj or others. It also seems like it be pretty hard to maintain that pattern consistently, since it depends only on good intentions. So I also think you're overthinking it here as well. Similarly to before, I wouldn't concern myself with such a detail, unless I'm in an absolute performance critical use case, and I've exhausted all other approach, and my profiling indicates that getting rid of seq for emptiness checks in my conditions could shave off a reasonable amount of time, then only would I bother, and still only for the profiled functions which showed hot spots around it.

...