Welcome! Please see the About page for a little more info on how this works.

0 votes
in IO by

I found a strange behavior in implementation of clojure.java.io/copy function


(defn copy
  "Copies input to output.  Returns nil or throws IOException.
  Input may be an InputStream, Reader, File, byte[], char[], or String.
  Output may be an OutputStream, Writer, or File.
  Options are key/value pairs and may be one of
    :buffer-size  buffer size to use, default is 1024.
    :encoding     encoding to use if converting between
                  byte and char streams.   
  Does not close any streams except those it opens itself 
  (on a File)."
  {:added "1.2"}
  [input output & opts]
  (do-copy input output (when opts (apply hash-map opts))))

Actual copying is implemented here when copying from an InputStream to an OutputStream.


(defmethod do-copy [InputStream OutputStream] [^InputStream input ^OutputStream output opts]
  (let [buffer (make-array Byte/TYPE (buffer-size opts))]
    (loop []
      (let [size (.read input buffer)]  ;;; XXX point 1
        (when (pos? size)               ;;; XXX point 2
          (do (.write output buffer 0 size)

Here .read function at point 1 is https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read(byte[])

The javadoc states following

Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
If the length of b is zero, then no bytes are read and 0 is returned; otherwise, there is an attempt to read at least one byte. If no byte is available because the stream is at the end of the file, the value -1 is returned; otherwise, at least one byte is read and stored into b.

Meaning that return value -1 means end of stream, and return value 0 doesn't mean end of stream. However condition at point 2 in code above, stops recursion when .read returns value that is smaller than 1.

Now consider a case where .read returns this sequence of values in consecutive calls:

1024, 0, 1024, 201, -1

clojure.java/io copies only the first 1024 bytes, when the whole stream has 2249 bytes. Is this the intended behavior? Should the condition at point 1 be (not (neg? size)) ?

I posted this issue also to google groups: https://groups.google.com/forum/#!topic/clojure/XzpPPXXhgM4

2 Answers

+1 vote

I agree this is a bug and will create a jira for it.

I don't think this is a bug.

The critical point is this bit in the java-doc: "If the length of b is zero, then no bytes are read and 0 is returned"

Only if the buffer b has a length of zero, no bytes will be read. Unless you explicitly want an infinite loop if a buffer-size of 0 is passed, the implementation should not be changed in my opinion.
Javadoc says that 0 should not be returned, except if the input buffer has 0 length.

1. There are InputStream implementations that actually return 0 against JavaDoc. For example Apache commons-compress does this when extracting files from some specific ZIP files. This may be a bug but those things exist.
2. Other InputStream->OutputStream copy implementations read streams until `-1` is received. For example:
     -  https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1083
     - https://github.com/eclipse/jetty.project/blob/9706d70484863a014d3604e5e7cb4af40aa4cb1e/jetty-util/src/main/java/org/eclipse/jetty/util/IO.java#L161
3. clojure.java.io/copy has to handle `0` somehow and thats a gray area, neither way is specification wise better.
See my answer. The documentation is very clear that a) the method blocks until input data is available (or EOF or an exception) and b) that at least one byte is read and stored into b (unless EOF, exception, or buffer size is zero).

The Apache copyLarge() function won't work correctly for a buffer of size zero. The Jetty copy is fine because it explicitly uses a non-zero size buffer so that condition can never occur.
0 votes

I think you are misreading the documentation. The key parts here are:

This method blocks until input data is available, ... ...; otherwise,
at least one byte is read and stored into b.

So it will never read zero bytes except when the buffer size is zero.

The current code is correct to stop on zero -- which will occur (only) when the buffer passed in has size zero.

I observed this sequence of read return values in running code and using buffer size 1024. So here clojure's copy function stopped copying at zero and IOUtils ja Jetty copied the whole stream. InputStream was from apache commons-compress. So here the InputStream doesn't work according the JavaDoc.

      ...,1024,0,1024, ..., -1
And as I said: the Apache code is broken.
https://issues.apache.org/jira/browse/COMPRESS-491 Here is the issue in apache libs.