Clojure Q&A - Recent questions tagged string

Missing stateful string accumulator function (for transduce target)

Fri, 15 Sep 2023 19:04:56 +0000

If you have a transducer chain that makes a stream of strings, you might want to accumulate them at the end into a string. Existing functions like str will generate a lot of overhead as there is no stateful accumulator you can maintain.

What you really want is a StringBuilder that accumulates, and emits the string at finalization.

String joining could then be something like:

(transduce (interpose ",") str! coll)

extensible with more transducers as needed.

cgrand's xforms lib has something like this: https://github.com/cgrand/xforms/blob/2079b74271b858b6a91dcb87bc58f3b93ea0b19c/src/net/cgrand/xforms/rfs.cljc#L145-L147

Can `clojure.string/join` avoid empty strings?

Wed, 25 May 2022 21:28:25 +0000

I'm trying to use the join function to group some strings in a singular string but I'd like to avoid empty strings.

For example, if I try to join ["" "" "xpto"] the return is "--xpto" and actually I think is weird and I'd like to avoid it.

To do it I created a util function removing it

(defn join-not-empty
  [separator coll]
  (join separator (remove empty? coll)))

I'd like if make sense to use this approach as an option in the join function.

Add limit third argument of split to split-lines

Sun, 24 May 2020 20:09:51 +0000

The signature for clojure.string function split-lines is:

(split-lines s)

And the signature for split is:

(split s re)
(split s re limit)

I think it would be nice to add this overload that accepts a limit to split-lines as well, just as it is for split:

(split-lines s limit)

My use case is often to split on lines but not to ignore trailing newlines. This requires passing -1 to limit. In those cases, I always need to switch back to using split and re-implement split-lines.

Transducers and Maps

Tue, 24 Sep 2019 20:14:24 +0000

Hi,

I'm learning more about transducers, but coming up with some deadends.

What I'm looking to understand, is whether they are appropriate to use (in my use-case, but also as a learning exercise) and whether what I am doing is right/efficient/maybe-there-is-a-better-way :-)

Let' say I have a data stucture thus:

 (def trip {:tripData 
            {:segments [{:dataPoints 
                         [{:location {:lat 1 :lng 2}} 
                          {:location {:lat 3 :lng 4}}]}]}})

There could be hundreds/thousands of dataPoints with each dataPoint having a single location.

I want to efficiently extract out only the lat and lng into a single collection and turn it into a string. I came up with this:

 (def xf
   (comp
    (mapcat :dataPoints)
    (map :location)
    (map (fn [{lat :lat lng :lng}] (str lat " " lng)))))

then evaluated via:

(def lat-lng (into [] xf (->> trip :tripData :segments)))

And I get back something like this:

["1 2" "3 4"]

Which I can then do (for the purposes of my exercise), this:

 (clojure.string/join ", " lat-lng)

to obtain, finally, this:

"1 2, 3 4"

Which is all fine and dandy :-)

However, given my inexperience with transducers, I'm left wondering if there is a different/better way. For example, a way, within the comp xf, to turn the data into the string and joined at the end instead of using clojure.string/join.

I also discovered, I can do this too, without the use of transducers:

 (def lat-lng-2 (->> trip
                     :tripData
                     :segments
                     (mapcat :dataPoints)
                     (map :location)
                     (map (fn [{lat :lat lng :lng}] (str lat " " lng)))))

Which, given the clojure.string/join, ends up with the same result.

However, it's my understanding that you can't use a map keyword (i.e., :tripData, :segments) as part of a comp as keywords are not transducers.

I'm at a loss on how to make this efficent/better whilst learning how I can use transducers.

I would appreciate some help/guidance/feedback!

Thank you.

Add text block literal, or raw string literal, or unsescaped string literal.

Tue, 03 Sep 2019 22:46:52 +0000

Problem:

Having had to include some JavaScript, XML and HTML inside of my Clojure code here and there, it can be pretty annoying and error prone to have to escape quotes. This holds true as well when scripting, and running shell command, you can get into hairy escaping scenarios.

Solution:

Add a string literal which can be adapted to contain any sort of string without the need for escaping.

Suggestions:

Text blocks

Some other languages offer something called a text block where you can write a string using triple or more quotes, where all characters are then allowed:

(println """
         This " is allowed,
         and no need to escape it.
         """

Text blocks often come with additional features, such that the first and last newline isn't part of the string. And the position of the triple quote in the source code delineate the beginning of the lines in the quote. Thus the above code prints:

This " is allowed,
and no need to escape it.

and not:

         This " is allowed,
         and no need to escape it.

While text blocks are neat visually, as they have nice alignment in the source code. They are whitespace dependent, and Clojure up to now is a whitespace independent language, meaning whitespace does not matter. I think it would be best to keep it that way. Thus the next two suggestions.

Raw strings

Sometimes the text block without the "block" features is known as a raw string literal:

(println """This " is allowed,
and no need to escape it.
Also support multi-line, but
not the "block" style of text blocks.""")

Thus:

(println """
         This " is allowed,
         and no need to escape it.
         """

Prints:

         This " is allowed,
         and no need to escape it.

Unlike for Text Blocks.

If you need a triple quote, just make the delimiter a quadruple quote:

""""This """ is now allowed as well.""""

The issue with raw string is that, if you use say double quotes as your delimiter:

""This is a raw " string!""

But want your single quote to be at the beginning or the end:

"""{{hello}}"""

I want the string: "{{hello}}", not {{hello}}, but the raw string can not disambiguate the two, as now it thinks this is a triple quoted delimiter.

One solution is to allow an escaped quote only at the beginning or end:

""\"{{hello}}\"""

But not in the middle:

""\"{{he\llo}}\"""

This is the string: "{{he\llo}}"

So the escape character \ can appear anywhere except at the beginning if followed by a quote, and at the end if followed by a quote.

I still don't find this ideal. There's too many rules, and there are still cases where an escape is required.

Unescaped string (my favorite)

The idea here is to allow any string to be used as the delimiter. Thus given whatever possible string we want to nest inside our Clojure code, we can always find a string which is not contained in it to use as our delimiter.

Lets say the reader macro #text is added. Which expects the following form to be a regular string which tells it the delimiter for the following form to read:

(println #text "|" |"{{hello}}"|)

Would print:

"{{hello}}"

The first arg to #text tells it what the delimiter for the following raw string should be. That way, you absolutely never need an escape sequence inside the raw string. For any given string, you can find a delimiter string not contained in it to handle it properly.

A crazy thought I had with this approach, just trowing it out there, is if you use a sufficiently random string as the delimiter, could be a weird way to protect against forms of injection:

(println #text "xIBgdSl4TCCOIdqdMu9G" xIBgdSl4TCCOIdqdMu9G
Can't nobody guess the delimiter to escape the string context :p
xIBgdSl4TCCOIdqdMu9G)

Thank You