Welcome! Please see the About page for a little more info on how this works.

+2 votes
in Syntax and reader by

Is it not possible to create a tagged literal for a clojure.core.Vec (that result of the vector-of function)? Creating a reader function for such a tag is trivial, and it works well enough with read-str and EDN readers. But the REPL always winds up holding a clojure.lang.PersistentVector.


This issue came about as I thrash around looking for a round-trip support for a hexstring of bytes with support for idiomatic Clojure (clojure.lang.ISeq and immutability are high priorities).

  1. Despite the promise of (vector-of :byte ...), it's not possible to round trip the byte data cleanly with clojure.core.Vec.
  2. With a tag reader and print-dup writer I can get round-trip ability for JVM native byte arrays, but they don't support clojure.lang.ISeq and they're mutable.
  3. Regular vectors are heterogeneous and I can't safely appropriate
    print-dup for them to achieve my goals.

None of these options are attractive.

2 Answers

0 votes
by

What do you mean by "But the REPL always winds up holding a clojure.lang.PersistentVector."? The REPL doesn't "hold" anything.

by
edited by
It appears that anything implementing PersistentVector is caught up during the analyze phase when evaluated in the repl.  If you define a data reader that yields a Vec, that will be evaluated into a persistent vector  containing boxed types of the formerly primitive  Vec contents.  This doesn't happen if, as in Andy's example (and my initial hack on this at the clojureverse [thread](https://clojureverse.org/t/tagged-literal-for-clojure-core-vec-not-possible-for-clojure/6452/6) ) you bind data readers and use read-string.  You'll retain the Vec that's read because it's never evaluated as a VectorExpr by the repl.  That doesn't solve OP's original problem of having a data_reader that produces a Vec that won't be eval'd into a vector.  The closest I got was defining a custom type to bypass the vector check, so it'd fall through evaluation.  There is a lingering problem with print-dup with that route though.
by
Yes, my choice of words "winds up holding" was not very precise.  In the REPL, reading a tagged literal directly (*not* with `read-str`) evaluates to a `clojure.lang.PersistentVector`.  It's a bit baffling at first because the printed representation of the `clojure.lang.PersistentVector` is the same as `clojure.core.Vec`.
0 votes
by

Below is a sample REPL session that shows one way to define a data reader that returns a clojure.core.Vec object containing bytes, and uses calls in the REPL to demonstrate that it is this type.

If you could share a similar REPL session of something you have tried that is giving you results that are clojure.lang.PersistentVector when you hoped they were clojure.core.Vec, sharing them in a follow-up comment might help us determine what is going on.

$ clojure
Clojure 1.10.1
user=> (defn first-non-hex-char [string]
  (re-find #"[^0-9a-fA-F]" string))
#'user/first-non-hex-char
user=> (defn hex-string-to-clojure-core-vec-of-byte [hex-string]
  (if-let [bad-hex-digit-string (first-non-hex-char hex-string)]
    (throw (ex-info (format "String that should consist of only hexadecimal digits contained: %s (UTF-16 code point %d)"
                            bad-hex-digit-string
                            (int (first bad-hex-digit-string)))
                    {:input-string hex-string
                     :bad-hex-digit-string bad-hex-digit-string}))
    (if (not (zero? (mod (count hex-string) 2)))
      (throw (ex-info (format "String contains odd number %d of hex digits.  Should be even number of digits."
                              (count hex-string))
                      {:input-string hex-string
                       :length (count hex-string)}))
      ;; There are likely more efficient ways to do this, if
      ;; performance is critical for you.  I have done no performance
      ;; benchmarking on this code.  This code is taking advantage of
      ;; JVM library calls ready aware of.
      (let [hex-digit-pairs (re-seq #"[0-9a-fA-F]{2}" hex-string)
            byte-list (map (fn [two-hex-digit-str]
                             (.byteValue
                              (java.lang.Short/valueOf two-hex-digit-str 16)))
                           hex-digit-pairs)]
        (apply vector-of :byte byte-list)))))
#'user/hex-string-to-clojure-core-vec-of-byte
user=> (def bv1
  (binding [*data-readers*
            (assoc *data-readers*
                   'my.ns/byte-vec user/hex-string-to-clojure-core-vec-of-byte)]
    (read-string "#my.ns/byte-vec \"0123456789abcdef007f80ff\"")))
#'user/bv1
user=> bv1
[1 35 69 103 -119 -85 -51 -17 0 127 -128 -1]
user=> (type bv1)
clojure.core.Vec
user=> (type (bv1 0))
java.lang.Byte
by
edited by
So the wrinkle you didn't run into (and I didn't either when I first tried this) is that OP is trying to bind a reader via data_readers and have the result of the reader left unevaluated by the repl.  That is, no use of `read-string` to avoid evaluating the result.  If you eval `bv1`, as the repl would after reading it and finding a Vec, you should get a persistent vector back of boxed Bytes, instead of the original Vec.  So the challenge is to avoid evaluating the result, which will be soaked up as an uneval'd VectorExpression, which will be coerced to a vector.  I thought to wrap the primitive vec in a deftype to avoid the interface check (it's not an IPersistentVector), and then implement the protocols for nth, count, seq, etc.  That works, since the custom type falls through the evaluation checks for vecs, sets, etc. and is left alone as an object/constant effectively (eval is identity).  Except then you run into problems with `print-dup` complaints that we couldn't figure out.  [original thread for context](https://clojureverse.org/t/tagged-literal-for-clojure-core-vec-not-possible-for-clojure/6452)
by
Tom, your assessment is spot on.  For interactive work at the REPL, I can't find a satisfactory solution.
by
I do not know if you consider this a satisfactory solution or not, but you can prevent the evaluation by quoting the tagged literal when used in Clojure code.  See the example REPL session in the README of this tiny Github project I created to demonstrate: https://github.com/jafingerhut/vec-data-reader
by
That's obvious, yet clever lol.  You're already tagging the literal, adding a quote is simple enough, and literally means "don't eval this".  I would personally be okay with it as a work around.  I'm still inclined out of curiosity to see if the tagged literal route could be made to work without quoting, but more out of interest in Clojure pecularities.  Why don't we have to do this with #inst, for example?  (Aside from the read-time form of inst being a string, which is a constant instead of a plausible VectorExpr).
by
You could try locally on your own copy of the Clojure implementation replacing a few lines of code in the `eval` and `emitExpr` methods of class `VectorExpr`.

The `eval` method looks like it might be straightforward to call the `empty` method that `clojure.core/empty` calls: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L3220-L3225

For the `emitExpr` method, it is less clear to me what can be done there: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L3227-L3244

Some folks on the #clojure-dev channel on Clojurians Slack would be quicker at figuring out what kinds of changes might work, and which definitely won't, than I would.  Also whether my suggestions here are only partial.  In any case, since such changes touch Clojure's compiler internals, I would personally not expect this to change in an officially released version of Clojure any time soon, or perhaps not ever.
by
We don't have to quote #inst because the resulting value is self-evaluating in the Clojure compiler, I believe.  Vectors are not self-evaluating.  That is, sometimes they evaluate to an object with the same JVM class, sometimes they don't.  If vectors were self-evaluating in Clojure, then you could not write expressions like `(def v1 [(+ x 2) (some-other-call with args)])` and have the vector elements evaluated.
by
Andy, here is the minimal example illustrating the problem:

$ cat deps.edn
{:deps {org.clojure/clojure {:mvn/version "1.10.1"}} :paths ["."]}

$ cat data_readers.clj
{bs user/read-bytes}

$ clojure -e "(defn read-bytes [bs] (apply vector-of :byte bs)) [(class (read-string \"#bs [-1 0 1]\")) (class #bs [-1 0 1])]"
#'user/read-bytes
[clojure.core.Vec clojure.lang.PersistentVector]
by
Thanks for the shorter example.  Putting a single quote before the `#bs` in your last expression should give you a `clojure.core.Vec` type.
by
Actually, it gives me another error -one that I have slammed me head into many times:

Can't embed object in code, maybe print-dup not defined: clojure.core$reify__8311@a3d9978
by
Yes, I get that, too.  Sorry for my incorrect suggestion -- I was giving my expectation based upon similar but not identical experiments that I did earlier without that error.

I do not know the difference yet, but note that the following code with your same deps.edn and data_readers.clj file contents does work:

$ clojure -e "(defn read-bytes [bs] (apply vector-of :byte bs)) (def bv1 '#bs [-1 0 1]) (type bv1)"
#'user/read-bytes
#'user/bv1
clojure.core.Vec
by
bytevector.core> (deftype blee [x])
bytevector.core.blee
bytevector.core> #=(bytevector.core.blee. #=(vector-of :byte 1 0 1))
    Syntax error compiling fn* at (*cider-repl     workspacenew\bytevector:localhost:59588(clj)*:1:8145).
    Can't embed object in code, maybe print-dup not defined:    clojure.core$reify__8311@31c365b
bytevector.core> #=(bytevector.core.blee. 2)
#object[bytevector.core.blee 0x6e95ec6b "bytevector.core.blee@6e95ec6b"]

It has to do with how print-dup is typically working, with read-time eval the #= stuff, I think.
by
Yep, Tom and I noted that behavior last week as well.  I imagine that the special form has some subtle impact on the evaluation.

It's a nice work-around in some situations, but not so much for REPL work.
by
The root cause of the "Can't embed object in code, maybe print-dup not defined" with an object that has "reify" and a bunch of hex digits in its printed representation, is the following combination of factors:

(1) Clojure primitive vectors are defined with deftype.

(2) For all types defined via deftype, there is an emitValue Java method inside of Clojure's Compiler.java source file that has many cases for deciding how to embed a literal value in JVM byte code.  You can search that file for the first occurrence of "IType", which is a Java interface that Clojure deftype-created types all implement, in order to later recognize that they were objects of a class created via deftype.  When such an object is a literal inside of Clojure code, emitValue attempts to create JVM byte code that can construct the original value when that JVM byte code is later executed, and for deftype-created objects, it always tries to iterate through all fields of the object, and emit code for the field and its value.

(3) Clojure primitive vectors have a field "am", short for "array manager", that is an object created by calling Clojure's "reify" function.  This object is used to implement several Java methods on 'leaves' of the tree used to represent Clojure primitive vectors, one such object for each different primitive type, since the JVM byte code for dealing with arrays of each primitive type is different, and Rich was probably going for run-time efficiency here by not detecting the primitive type at run time on every operation, but instead having an object that already had baked into it code for dealing with that vector's primitive type.

(4) emitValue, when called with an object that is the return of a "reify" call, tries to call `RT.printString` on it, which would work if a `print-dup` method were defined to handle such objects, but in general objects returned by "reify" can have arbitrary references to other JVM objects with internal state, or can have internal state themselves, so there is no good general way to create a `print-dup` definition that handles all possible objects created by calling "reify".

What could be done about this?

There are probably many alternatives I haven't thought of, but here are a few potential approaches, most of which would require changing Clojure's implementation in some way.

(approach #1a)
Change Clojure's primitive vector implementation so that all of its field values were immutable values with printable representations, i.e. no objects returned from 'reify', nor any function references.  Since primitive vectors are trees with O(log_32 n) depth, the representation created via emitValue would reflect that tree structure, but it seems like it could be made to work correctly.  This would likely lead to some lower run-time performance of operations on primitive vectors, since there would need to be a "case" or other conditional code to handle the different primitive types in leaf nodes.

(approach #1b)
Create a new implementation of Clojure primitive vectors that uses deftype, but has the changes suggested in #1a above.  No changes to Clojure's implementation would be required, since it would be a 3rd party implementation that can make its own implementation choices.

(approach #2)
Change the emitValue method in Compiler.java so that for deftype-created objects, it somehow checked whether there was a print-dup method for that object's class first, and used it if it was available, falling back to the current approach if there was not.  That would be somewhat tricky in this case, because Clojure primitive vectors implement the clojure.lang.IPersistentCollection interface, which already has a print-dup method that will not work for primitive vectors.  One possibility is not to simply call print-dup and see what happens, but to check whether the print-dup multimethod has an implementation for _exactly_ the class of the object one is trying to do emitValue on, e.g. clojure.core.Vec for primitive vectors.  Such an exact class check for multimethod implementations seems against the philosophy of multimethods in Clojure, and seems a bit hackish.

Another cleaner variation on this idea would be to define a new "emittable" interface in Clojure's implementation, and if a deftype-created class implemented it, then emitValue would use the 'emit' method of that interface on objects that implemented it.

(approach #3)
Create a separate Clojure primitive vector implementation that does not use deftype, nor defrecord, and falls into the last "else" case of the long if-then-else daisy chain of Clojure's emitValue.  This seems difficult, or maybe impossible, to me, without changing the emitValue method, because it currently has a case for clojure.lang.IPersistentVector before the last "else", and it would be very strange to try creating a Clojure primitive vector implementation that did not implement that interface.

Of the ones I have thought about, approach #1b, or the last variant of approach #2, seem possibly workable.  #1b requires no changes to Clojure's implementation.  #2 definitely does.  Approach #3 probably isn't really a viable alternative, for reasons stated above.

More details can be found in this repo's README: https://github.com/jafingerhut/vec-data-reader
by
edited by
Andy, your analysis matches my experience, you express the problem well and you propose some reasonable solutions.  Thank you.

I seriously considered #1b (since the others are outside my weak Java skils) and even put together a trial implementation.  One frustration I encountered is that _whatever_ bottom type is used to store the data (in my case, a persistent vector of bytes with homogeneity enforced by my implementation of IPersistentVector/assocN) I needed my type to implement ISeq.   But as soon as I did, my ability to control printing was gone.  I'm sorry I don't remember any more details of that experiment... maybe I can resurrect it.

(Sidebar: perhaps related to your observation in #2, but an interesting tangent : by what means does _print-method_ for clojure.core.Vec get determined?  This line (https://github.com/clojure/clojure/blob/master/src/clj/clojure/gvec.clj#L455) seems like the crux, but I don't understand how ::Vec is in play and in fact when I override (presumably) Vec's print method I don't use the global hierarchy -I just defmethod for the class.  My sneaking suspicion is that, like my attempts with my own type, it's is never being called.)

My ultimate goal is to support a hexstring literal (REPL-compatible) reader and printer backed by some type that supports idiomatic Clojure operations on vectors.  A deftype backed by clojure.core.Vec seemed so tantalizingly close ...

Thank you again for your deep analysis of this.

[edited after a reading of your repo clarified exactly when and why I lose control of my deftype's printing]
by
I do not understand why the print-method in gvec.clj has a dispatch value of `::Vec` -- I would have expected it to be the class `clojure.core.Vec`, but I may not have all of the context necessary to understand why it is `::Vec`.

You can use `(methods print-method)` to see all dispatch values that have a `defmethod` defined for them -- it will be the keys of the map, all of which are class and interface names, with only the two in gvec.clj being keywords.  You can use `(get-method print-method <some-expression>)` to see which method would be called for a particular value.  If you want to know which dispatch value that corresponds to, you can either find it manually in the output of `(methods print-method)`, or you can write some code to find it for you.

I might have some time to look at a trial implementation of #1b, if you still have it around, in case I notice anything amiss, but no promises.
...