Welcome! Please see the About page for a little more info on how this works.

0 votes
in Java Interop by

Hello --

I'm learning Clojure and its Java interop stuff. I am trying to emulate this function:

 public String readAllCharsOneByOne(BufferedReader bufferedReader) throws IOException {
    StringBuilder content = new StringBuilder();
       
    int value;
    while ((value = bufferedReader.read()) != -1) {
        content.append((char) value);
    }
       
    return content.toString();
}

So far, I've been able to reason that the parts I need are:

(def myfile "/path/to/svenska_sample.txt")
(import java.io.BufferedReader)
(import java.io.FileReader)
(import java.lang.StringBuilder)
(import java.lang.Character)
 
(def a-FileReader (FileReader. myfile))
(def bufferedReader (BufferedReader. a-FileReader))
(def content (StringBuilder.))

which works

user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDe"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen "]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen t"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen ty"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typ"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typi"]

The file is a small text file UTF-8 encoded in Linux with the following content:

❯ cat svenska_sample.txt

Den typiska impulsiva olycksfågeln är en ung man som kraschar flera bilar, och ofta skryter lite med det, i varje fall när han är tillsammans med sina vänner._. För dem har otur i det närmaste blivit en livsstil, och de råkar konstant ut för olyckor, stora som små. Olycksfåglar kallar vi dem. Hur många det finns kan ingen med säkerhet säga, för det finns inga konkreta definitioner på denna grupp, och heller ingen given avgränsning av den. Att de finns, råder det emellertid ingen tvekan om, varken på sjukhusens akutmottagningar eller i försäkringsbranschen

I wrote a function, with my brand new Clojure Java interop chops, that looks like this:

;; COMPILES  - BUT CAN'T GET AROUND THE 0XFFFFFFF BUG
(defn pt%% [file]
   (let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
         bfr (BufferedReader. afr)
         ct (StringBuilder.)
         val (.read bfr)
         this-list (list afr bfr ct)]
         ; (apply println this-list)
         (loop []
               (when (not (= val -1))
                 (.append ct (Character/toChars (.read bfr))))
               (recur))
                ; when finished...
         (.toString ct)))
     

but it borks with the following error:

user> (pt%% myfile)
Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF

What in the world could be causing this (NOTE: I am not a Java programmer)?
Here is the hex dump of the file text file:

❯ cat svenska_sample.hexdump

00000000: 0a44 656e 2074 7970 6973 6b61 2069 6d70  .Den typiska imp
00000010: 756c 7369 7661 206f 6c79 636b 7366 c3a5  ulsiva olycksf..
00000020: 6765 6c6e 20c3 a472 2065 6e20 756e 6720  geln ..r en ung
00000030: 6d61 6e20 736f 6d20 6b72 6173 6368 6172  man som kraschar
00000040: 2066 6c65 7261 2062 696c 6172 2c20 6f63   flera bilar, oc
00000050: 6820 6f66 7461 2073 6b72 7974 6572 206c  h ofta skryter l
00000060: 6974 6520 6d65 6420 6465 742c 2069 2076  ite med det, i v
00000070: 6172 6a65 2066 616c 6c20 6ec3 a472 2068  arje fall n..r h
00000080: 616e 20c3 a472 2074 696c 6c73 616d 6d61  an ..r tillsamma
00000090: 6e73 206d 6564 2073 696e 6120 76c3 a46e  ns med sina v..n
000000a0: 6e65 722e 5f2e 2046 c3b6 7220 6465 6d20  ner._. F..r dem
000000b0: 6861 7220 6f74 7572 2069 2064 6574 206e  har otur i det n
000000c0: c3a4 726d 6173 7465 2062 6c69 7669 7420  ..rmaste blivit
000000d0: 656e 206c 6976 7373 7469 6c2c 206f 6368  en livsstil, och
000000e0: 2064 6520 72c3 a56b 6172 206b 6f6e 7374   de r..kar konst
000000f0: 616e 7420 7574 2066 c3b6 7220 6f6c 7963  ant ut f..r olyc
00000100: 6b6f 722c 2073 746f 7261 2073 6f6d 2073  kor, stora som s
00000110: 6dc3 a52e 204f 6c79 636b 7366 c3a5 676c  m... Olycksf..gl
00000120: 6172 206b 616c 6c61 7220 7669 2064 656d  ar kallar vi dem
00000130: 2e20 4875 7220 6dc3 a56e 6761 2064 6574  . Hur m..nga det
00000140: 2066 696e 6e73 206b 616e 2069 6e67 656e   finns kan ingen
00000150: 206d 6564 2073 c3a4 6b65 7268 6574 2073   med s..kerhet s
00000160: c3a4 6761 2c20 66c3 b672 2064 6574 2066  ..ga, f..r det f
00000170: 696e 6e73 2069 6e67 6120 6b6f 6e6b 7265  inns inga konkre
00000180: 7461 2064 6566 696e 6974 696f 6e65 7220  ta definitioner
00000190: 70c3 a520 6465 6e6e 6120 6772 7570 702c  p.. denna grupp,
000001a0: 206f 6368 2068 656c 6c65 7220 696e 6765   och heller inge
000001b0: 6e20 6769 7665 6e20 6176 6772 c3a4 6e73  n given avgr..ns
000001c0: 6e69 6e67 2061 7620 6465 6e2e 2041 7474  ning av den. Att
000001d0: 2064 6520 6669 6e6e 732c 2072 c3a5 6465   de finns, r..de
000001e0: 7220 6465 7420 656d 656c 6c65 7274 6964  r det emellertid
000001f0: 2069 6e67 656e 2074 7665 6b61 6e20 6f6d   ingen tvekan om
00000200: 2c20 7661 726b 656e 2070 c3a5 2073 6a75  , varken p.. sju
00000210: 6b68 7573 656e 7320 616b 7574 6d6f 7474  khusens akutmott
00000220: 6167 6e69 6e67 6172 2065 6c6c 6572 2069  agningar eller i
00000230: 2066 c3b6 7273 c3a4 6b72 696e 6773 6272   f..rs..kringsbr
00000240: 616e 7363 6865 6e0a                      anschen.

I then changed the code (see below), because of some suggestions that the problem might be my use of recur.

And now it complains that a previous form that was working '(.append etc..') doesn't and the same error remains.

user> (pt5 myfile)

Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF

(defn pt5 [file]

   (let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
         bfr (BufferedReader. afr)
         ct (StringBuilder.)
         this-list (list afr bfr ct)]
         ; (apply println this-list)
         (loop [val (.read bfr)]
               (when (not (= val -1))
                 (.append ct (Character/toChars (.read bfr))))
               (recur val))
                ; when finished...
                (.toString ct)))

Harder then it seemed at first sight...
Any help is greatly appreciated.`
-- Hank

1 Answer

0 votes
by

If you prefer an interactive chat experience for asking such questions, the #beginners channel on the Clojurians Slack community at https://clojurians.slack.com is a good place for such questions. Either here or there, it can be helpful if you are willing to publish longer code samples you are working on in a public place such as Github.com.

To your questions, your first function pt%% assigns a value to val before the loop begins, and then val always has the same value from that point onwards, never changing. Thus regardless of what is done inside of the loop, it will be an infinite loop, unless an exception occurs, which is what happens in your case. I have not tried to diagnose exactly why that particular exception occurs, because of the infinite loop issue and val never changing.

pt5 has the same issue that val is always the same, because (recur val) occurs in a context where val is bound to the same value as when the loop began.

In pt5, try using val as an argument to the tocChars method, and change the recur expression to (recur (.read bfr)) and see how that goes. And think about why that behaves differently than your version in the question.

by
Thanks. There's a longer discussion on the Google Clojure Group.
...