Hello --
I'm learning Clojure and its Java interop stuff. I am trying to emulate this function:
public String readAllCharsOneByOne(BufferedReader bufferedReader) throws IOException {
StringBuilder content = new StringBuilder();
int value;
while ((value = bufferedReader.read()) != -1) {
content.append((char) value);
}
return content.toString();
}
So far, I've been able to reason that the parts I need are:
(def myfile "/path/to/svenska_sample.txt")
(import java.io.BufferedReader)
(import java.io.FileReader)
(import java.lang.StringBuilder)
(import java.lang.Character)
(def a-FileReader (FileReader. myfile))
(def bufferedReader (BufferedReader. a-FileReader))
(def content (StringBuilder.))
which works
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDe"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen "]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen t"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen ty"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typ"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typi"]
The file is a small text file UTF-8 encoded in Linux with the following content:
❯ cat svenska_sample.txt
Den typiska impulsiva olycksfågeln är en ung man som kraschar flera bilar, och ofta skryter lite med det, i varje fall när han är tillsammans med sina vänner._. För dem har otur i det närmaste blivit en livsstil, och de råkar konstant ut för olyckor, stora som små. Olycksfåglar kallar vi dem. Hur många det finns kan ingen med säkerhet säga, för det finns inga konkreta definitioner på denna grupp, och heller ingen given avgränsning av den. Att de finns, råder det emellertid ingen tvekan om, varken på sjukhusens akutmottagningar eller i försäkringsbranschen
I wrote a function, with my brand new Clojure Java interop chops, that looks like this:
;; COMPILES - BUT CAN'T GET AROUND THE 0XFFFFFFF BUG
(defn pt%% [file]
(let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
bfr (BufferedReader. afr)
ct (StringBuilder.)
val (.read bfr)
this-list (list afr bfr ct)]
; (apply println this-list)
(loop []
(when (not (= val -1))
(.append ct (Character/toChars (.read bfr))))
(recur))
; when finished...
(.toString ct)))
but it borks with the following error:
user> (pt%% myfile)
Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF
What in the world could be causing this (NOTE: I am not a Java programmer)?
Here is the hex dump of the file text file:
❯ cat svenska_sample.hexdump
00000000: 0a44 656e 2074 7970 6973 6b61 2069 6d70 .Den typiska imp
00000010: 756c 7369 7661 206f 6c79 636b 7366 c3a5 ulsiva olycksf..
00000020: 6765 6c6e 20c3 a472 2065 6e20 756e 6720 geln ..r en ung
00000030: 6d61 6e20 736f 6d20 6b72 6173 6368 6172 man som kraschar
00000040: 2066 6c65 7261 2062 696c 6172 2c20 6f63 flera bilar, oc
00000050: 6820 6f66 7461 2073 6b72 7974 6572 206c h ofta skryter l
00000060: 6974 6520 6d65 6420 6465 742c 2069 2076 ite med det, i v
00000070: 6172 6a65 2066 616c 6c20 6ec3 a472 2068 arje fall n..r h
00000080: 616e 20c3 a472 2074 696c 6c73 616d 6d61 an ..r tillsamma
00000090: 6e73 206d 6564 2073 696e 6120 76c3 a46e ns med sina v..n
000000a0: 6e65 722e 5f2e 2046 c3b6 7220 6465 6d20 ner._. F..r dem
000000b0: 6861 7220 6f74 7572 2069 2064 6574 206e har otur i det n
000000c0: c3a4 726d 6173 7465 2062 6c69 7669 7420 ..rmaste blivit
000000d0: 656e 206c 6976 7373 7469 6c2c 206f 6368 en livsstil, och
000000e0: 2064 6520 72c3 a56b 6172 206b 6f6e 7374 de r..kar konst
000000f0: 616e 7420 7574 2066 c3b6 7220 6f6c 7963 ant ut f..r olyc
00000100: 6b6f 722c 2073 746f 7261 2073 6f6d 2073 kor, stora som s
00000110: 6dc3 a52e 204f 6c79 636b 7366 c3a5 676c m... Olycksf..gl
00000120: 6172 206b 616c 6c61 7220 7669 2064 656d ar kallar vi dem
00000130: 2e20 4875 7220 6dc3 a56e 6761 2064 6574 . Hur m..nga det
00000140: 2066 696e 6e73 206b 616e 2069 6e67 656e finns kan ingen
00000150: 206d 6564 2073 c3a4 6b65 7268 6574 2073 med s..kerhet s
00000160: c3a4 6761 2c20 66c3 b672 2064 6574 2066 ..ga, f..r det f
00000170: 696e 6e73 2069 6e67 6120 6b6f 6e6b 7265 inns inga konkre
00000180: 7461 2064 6566 696e 6974 696f 6e65 7220 ta definitioner
00000190: 70c3 a520 6465 6e6e 6120 6772 7570 702c p.. denna grupp,
000001a0: 206f 6368 2068 656c 6c65 7220 696e 6765 och heller inge
000001b0: 6e20 6769 7665 6e20 6176 6772 c3a4 6e73 n given avgr..ns
000001c0: 6e69 6e67 2061 7620 6465 6e2e 2041 7474 ning av den. Att
000001d0: 2064 6520 6669 6e6e 732c 2072 c3a5 6465 de finns, r..de
000001e0: 7220 6465 7420 656d 656c 6c65 7274 6964 r det emellertid
000001f0: 2069 6e67 656e 2074 7665 6b61 6e20 6f6d ingen tvekan om
00000200: 2c20 7661 726b 656e 2070 c3a5 2073 6a75 , varken p.. sju
00000210: 6b68 7573 656e 7320 616b 7574 6d6f 7474 khusens akutmott
00000220: 6167 6e69 6e67 6172 2065 6c6c 6572 2069 agningar eller i
00000230: 2066 c3b6 7273 c3a4 6b72 696e 6773 6272 f..rs..kringsbr
00000240: 616e 7363 6865 6e0a anschen.
I then changed the code (see below), because of some suggestions that the problem might be my use of recur.
And now it complains that a previous form that was working '(.append etc..') doesn't and the same error remains.
user> (pt5 myfile)
Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF
(defn pt5 [file]
(let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
bfr (BufferedReader. afr)
ct (StringBuilder.)
this-list (list afr bfr ct)]
; (apply println this-list)
(loop [val (.read bfr)]
(when (not (= val -1))
(.append ct (Character/toChars (.read bfr))))
(recur val))
; when finished...
(.toString ct)))
Harder then it seemed at first sight...
Any help is greatly appreciated.`
-- Hank