Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in Printing by
edited by

is a serial representation of a clojure map unambiguously defined? if i compute CID of 'pr-str' of a clojure map, is it unique?

by
What is CID?
by
it is a digest of the content.
say SHA-256, or whatever digest function, doesn't matter.
i want to know if a clojure map may have non-unique hashes when fed to a digest function.
by
```(sha-256 (pr-str {:one 1 :two 2}))```

now i imagine a map being a set of key-value vectors

```#{ [:one 1] [:two 2]  }```

i have written :one first in my code.
but does it always come before :two?

if i run the above code, will it always return a single unique value?
by
i imagine clojure maps are sets equipped with a special `bind` or `>>=` operator that makes 'invoking' the set with one of its elements return whatever was bound to that element.

({:one 1 :two 2} :one) => 1
by
https://www.reddit.com/r/Clojure/comments/foqt7o/when_and_why_would_you_want_to_use_sorted_maps/

i guess i will be using `sorted-map` for the sake of 'serialization predictability' when the map's universally unique identity is at stake.

it so happens that JSON is also unordered.
by
https://groups.google.com/g/golang-nuts/c/opEBtevDCyI

hmmm. it feels a bit weird that data structures that are supposed to live across runtimes are not comparable in their serial form without further info about serialization protocol.
by
i'll probably use vectors instead.
by
(This message assumes that you want to calculate a hash/digest function D with the property that for any two unordered sets s1, s2, if (= s1 s2) is true, then (= (D s1) (D s2)) is also true.
 Similarly for unordered Clojure maps.)

Note that it is perfectly possible to develop a digest/hash function that is deterministic for unordered objects like unordered sets and Clojure maps.  In fact, `clojure.core/hash` is one such function.  Such a function must always produce the same result regardless of the order of its elements (for a set) or key/value pairs (for a Clojure map).  That puts restrictions on how the function is calculated, and many functions used for digests are _not_ appropriate for that purpose.

If you want to use a hash/digest function where changing the order of elements given to it as input causes the output of that function to change, then I would suggest you are not going to get a result that you will want to use.

2 Answers

+1 vote
by
selected by
 
Best answer

Maps (other than sorted maps) are unordered and may not print in the same order depending on Clojure version, JVM version, and whatever customizations you may have installed in the Clojure print system (which is open to modification and extension).

The same instance of a map in a runtime will always print its elements in the same order - that's the only guarantee you have.

by
could the same thing be said of the following?

'(into [] a-map)

in other words, is the 'vector representation of an unordered map' unspecified?
by
This is the same question - what order are map elements visited in to add to the vector? There is no defined order.
0 votes
by

if i compute CID of 'pr-str' of a clojure map, is it unique?
i want to know if a clojure map may have non-unique hashes when fed to a digest function.

It depends on the digest, not on the map, the fact that it comes from Clojure, or the fact that you've used pr-str.
By definition, any reasonable digest is a lossy function, so it always allows for conflicts, even if probability for some digests is minuscule.

i have written :one first in my code.
but does it always come before :two?

The order of a hash set/map is undefined. But it is the same for the same object.
However, it can be different for different objects even if the content is the same:

user=> (mapv hash [0 0.])
[0 0]
user=> (pr-str (hash-map 0 0 0. 0))
"{0 0, 0.0 0}"
user=> (pr-str (hash-map 0. 0 0 0))
"{0.0 0, 0 0}"

if i run the above code, will it always return a single unique value?

As per the above, it's not guaranteed. It can never be guaranteed for a lossy function such as a digest.

i imagine clojure maps are sets equipped with a special bind or >>= operator that makes 'invoking' the set with one of its elements return whatever was bound to that element.

That doesn't seem to be related to the rest of the question. But no, there's nothing special about sets or maps in Clojure in this regard, and Clojure doesn't have operators.
Sets and maps in Clojure are, among some other things, callable - as simple as that.

by
edited by
digest collision is astronomically unlikely. so that's not what i'm worried about.
i'm worried about the obverse case of 'double digest result' which may be as likely as a coinflip coming up heads for a map with two entries.
by
edited by
"That doesn't seem to be related to the rest of the question. But no, there's nothing special about sets or maps in Clojure in this regard, and Clojure doesn't have operators.
Sets and maps in Clojure are, among some other things, callable - as simple as that."

i was just describing a theoretical model of what is going on. it is the best my imagination can work with given my limited knowledge of clojure's internals & performance concerns.

i cognize a purely theoretical aspect to Clojure that is not pre-occupied with performance or implementation or persistence constraints. something like a 'data-informed set-theory'.
...