Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

+39 votes
in Clojure by
edited by

I regularly write: (some #(when (pred %) %) ...) and often mistakingly write: (some #(pred %) ...) instead. I think it would be worth having (some #(when (pred %) %) ...) as a built-in in clojure.core.

Suggested name: first-by. Feel free to suggest other names, I'll list them here.

(first-by #(= (:id %) 2) [{:id 1} {:id 2} {:id 3}]) ;;=> {:id 2}

Statistics, in my local .m2 dir, I find 198 (some #(when ...) ..) forms and 1503 (some #(foo ...) ...) forms where foo is not when.

Total some + fn usage: 1701 of which 11% is of the some + when form.

Program that found these usages:

(ns grasp
  (:require
   [clojure.spec.alpha :as s]
   [grasp.api :as g]))

(s/def ::some+when
  (s/cat :some #{'some}
         :fn (s/spec (s/cat :fn #{'fn 'fn*}
                            :args vector?
                            :when (s/spec (s/cat :when #{'when} :whatever (s/* any?)))))
         :coll any?))

(defn keep-fn [{:keys [spec expr uri]}]
  (let [conformed (s/conform spec expr)]
    (when-not (s/invalid? conformed)
      {:expr expr
       :uri uri})))

(defn -main [& args]
  (let [classpath (first args)
        matches (g/grasp classpath ::some+when {:keep-fn keep-fn})]
    (prn (count matches))))

{:deps {io.github.borkdude/grasp {:mvn/version "0.0.3"}}}

On grep.app I find about 8% of some usage of the some+when shape.

7 Answers

+7 votes
by

I'd prefer something like (first xform coll) mentioned in a comment at CLJ-2056. The idea was mentioned there, but I don't think it was fully discussed at that time.

It's more general than some or first-by: (some <pred> <coll>) is just (first (keep <pred>) <coll>), and (first-by <pred> <coll>) is just (first (filter <pred>) <coll>).

Also, it could be more efficient than some or first-by. If you want to use it in conjunction with other sequence functions, they can be unified into the transducer, like:

(first (comp (map #(* % %))
             (filter #(> % 100)))
       (range))

instead of:

(first-by #(> % 100) (map #(* % %) (range)))
by
Note that I don't want to base first-by on filter, because chunking:

https://twitter.com/borkdude/status/1567525617549152257
by
(first xform coll) would not be affected by chunking though because it would use the transducer-returning arity of filter.

Huge difference:

(first (filter <pred>) <coll>)
vs
(first (filter <pred> <coll>))
by
edited by
Yeah, if we implement it like this:

(defn first' [xf coll]
  (transduce xf (completing (fn [_ x] (reduced x))) nil coll))

it will not cause any chunking. In fact, it doesn't even realize the sequence at all.
by
Ah, (completing reduced), genious! Much more concise than what I was suggesting based on xforms.
by
Sorry, I meant `(completing (fn [_ x] (reduced x)))`. Fixed.
+2 votes
by

FWIW, a similar proposal has been rejected before: https://clojure.atlassian.net/browse/CLJ-2056

by
Alex’s response is interesting, and I do see his point.

That said, while I agree that linear searches are inefficient, that’s when considering larger datasets. My most common use for this operation is in processing short sequences, eg when processing lines of text. Building an indexed structure for a single lookup still need linear processing of the data and is much less efficient for short seqs anyway (compare with ArrayMap).

If a different structure were always the right way, then we wouldn’t have a dozen people immediately upvoting this ticket, with several of us commenting that we do this all the time.
+2 votes
by

I see the argument that allowing or even supporting linear searching might lead people to use the wrong data structure to begin with.

However, often enough a simple linear search through a small collection is exactly the right thing to do, and it is always a bit awkward to do this in Clojure. And I think that this is the bad kind of awkwardness, the kind that leads to unclarity and bugs.

When I need it, I usually implement find-if from Common Lisp in some utility namespace, because it makes the code very clear:

(find-if odd? xs)

+1 vote
by

Just a note, I'd prefer something like when-valid in core instead of a first-by:

(defn when-valid [pred]
  #(when (pred %) %))

This can be composed with some:

(some (when-valid pos?) [-1 0 2]) => 2

It can also be used with some-fn:

((some-fn (when-valid string?) (when-valid pos?)) "foo") => "foo"
((some-fn (when-valid string?) (when-valid pos?)) -5) => nil
by
edited by
my latest version of such function is `select`:

```
(defn select
  "Returns `x` if `(pred x)` is logical true, else `nil`.
   Returns function #(select % pred) in case of 1-arity."
  ([pred]
   #(select % pred))
  ([x pred]
   (when (pred x) x)))

(some-> x (select number?) inc)

(keep (select pos?) xs)
```
+1 vote
by
edited by

Using cgrand/xforms I normally solve this with x/some (x/some (filter <pred>) coll), which works as long as the item I want to return isn't nil.

A potential implementation based on the xforms code could look like:

(defn rf-first
  "Reducing function that returns the first value."
  ([] nil)
  ([x] x)
  ([_ x] (reduced x))) 

(defn xf-first
  "Process coll through the specified xform and returns the first value."
  [xform coll]
  (transduce xform rf-first nil coll))

(xf-first (filter <pred>) coll)
by
How well this this play with chunking?  some + when does not chunk
by
Where in this do you see chunking?
by
Nowhere, makes sense to me now.
0 votes
by

I'm doing the same everyday! Upvoted.

by
I've personally referred to this function as "find" and defined it as so. That's what I've seen it called in other languages and functional libraries in the past IIRC. I think "first" is a part of the solution, but not what comes to mind for a name since it's filter+first, they're both equal component parts so I feel like the name should describe the combination of mixing the two.
by
I’ve gone with `ffilter` since the other way I’ve done it is by wrapping `filter` with `first`
0 votes
by
edited by

As mentioned in another answer, this has been considered and declined in the past. I don't think anything has changed since then.

Anecdotes about this are not particularly useful, but data is. If you can use grep.app or grasp or something to collect usage data that would be helpful. Also useful is having a list of existing similar functions in common utility libs and their usage (and how those impls differ if they do).

Separately, it may be useful to approach this from better considering a corpus of usages and see if there is a deeper or more interesting commonalty or problem that may have other alternative solutions.

...