<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Clojure Q&amp;A - Recent questions in data.csv</title>
<link>https://ask.clojure.org/index.php/questions/contrib-libs/data-csv</link>
<description></description>
<item>
<title>Supporting more malformed files in clojure.data.csv/read-csv</title>
<link>https://ask.clojure.org/index.php/10499/supporting-more-malformed-files-in-clojure-data-csv-read-csv</link>
<description>&lt;p&gt;Imagine we have the following CSV file&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A,B,C
this is,&quot;a badly&quot; quoted, file
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When trying to parse this file with clojure.data.csv/read-csv, I get the following exception&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{:type java.lang.Exception
 :message &quot;CSV error (unexpected character:  )&quot;
 :at [clojure.data.csv$read_quoted_cell invokeStatic &quot;csv.clj&quot; 37]}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This file is clearly malformed, but I've seen a file like this in the wild so it would be nice if read-csv handled extra content after the quoted portion by parsing this to&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[&quot;this is&quot; &quot;a badly quoted&quot; &quot; file&quot;]    
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;p&gt;Potential problem with this proposal:&lt;/p&gt;
&lt;p&gt;If there's a separator inside the quotes this becomes harder to interpret. e.g. &lt;/p&gt;
&lt;pre&gt;&lt;code&gt;this is,&quot;a, badly&quot; quoted, file
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;could be parsed to&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[&quot;this is&quot; &quot;a, badly quoted &quot; &quot; file&quot;]

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or &lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[&quot;this is&quot; &quot;\&quot;a&quot; &quot; badly\&quot; quoted &quot; &quot; file&quot;]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While this second interpretation seems improbable to me, I'm not sure what the &quot;best effort&quot; interpretation strategy is in this case&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/10499/supporting-more-malformed-files-in-clojure-data-csv-read-csv</guid>
<pubDate>Mon, 19 Apr 2021 21:22:10 +0000</pubDate>
</item>
<item>
<title>Support CSV Injection Escape Mechanisms</title>
<link>https://ask.clojure.org/index.php/7009/support-csv-injection-escape-mechanisms</link>
<description>&lt;p&gt;CSVs generated using clojure.csv is succeptible to injection attacks. It'd be a nice enhancement to have an option to apply this transformation on behalf of users.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;(with-open (link: writer (io/writer &quot;out-file.csv&quot;))&lt;br&gt;
  (csv/write-csv writer&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;             (link: [&quot;abc&quot; &quot;def&quot;)
              (link: &quot;ghi&quot; &quot;=jkl&quot;)]))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See &lt;a rel=&quot;nofollow&quot; href=&quot;https://www.owasp.org/index.php/CSV_Injection&quot;&gt;https://www.owasp.org/index.php/CSV_Injection&lt;/a&gt;&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7009/support-csv-injection-escape-mechanisms</guid>
<pubDate>Fri, 23 Feb 2018 17:50:58 +0000</pubDate>
</item>
<item>
<title>Adding namespace qualifiers in documentation</title>
<link>https://ask.clojure.org/index.php/7008/adding-namespace-qualifiers-in-documentation</link>
<description>&lt;p&gt;README.md has code examples that are missing namespace qualifiers, Adding them will make it easy to copy the code in REPL  and make it work.&lt;br&gt;
In addition to that there are couple of fixes for misspelled words. &lt;br&gt;
The commit is in the following fork for reference&lt;/p&gt;
&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://raw.githubusercontent.com/as17237/data.csv/master/README.md&quot;&gt;https://raw.githubusercontent.com/as17237/data.csv/master/README.md&lt;/a&gt;&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7008/adding-namespace-qualifiers-in-documentation</guid>
<pubDate>Sat, 11 Nov 2017 12:49:29 +0000</pubDate>
</item>
<item>
<title>Use Reducers/Transducers for better performance &amp; resource handling</title>
<link>https://ask.clojure.org/index.php/7004/reducers-transducers-better-performance-resource-handling</link>
<description>&lt;p&gt;One problem when using the clojure.data.csv library is that it's built upon lazy sequences which can lead to inefficiencies when processing large amounts of data, for example even before any transformation is done the base-line parsing of 1gb of data of CSV takes about 50s on my machine.  Other parsers available on the JVM can parse this quantity of data in less than 4 seconds.&lt;/p&gt;
&lt;p&gt;I'd like to discuss how we might port clojure.data.csv to use a reducer/transducer model, for improved performance and resource handling.  Broadly speaking I think there are a few options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Implement this as a secondary alternative API in c.d.csv leaving the existing API and implementation as is for legacy users.&lt;/li&gt;
&lt;li&gt;Replace the API entirely with no attempt at retaining backwards compatibility.&lt;/li&gt;
&lt;li&gt;Retain the same public API contracts, whilst trying to reimplement it underneath in terms of reducers/transducers.  Use transducers underneath but use &lt;code&gt;sequence&lt;/code&gt; to retain the current parse-csv lazy-seq contract, whilst offering access into a new pure transducer/reducer based API for non legacy users or those who don't require a lazy-seq based implementation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;1 and 3 are essentially the same idea, except in 3 users get the benefit of a faster underlying implementation, there may also be other options.&lt;/p&gt;
&lt;p&gt;I think 3, if possible, would be the best option.&lt;/p&gt;
&lt;p&gt;Options 1 and 2 raise the question, of making no attempt at backwards compatibility or improving the experience for legacy users.&lt;/p&gt;
&lt;p&gt;Before delving into the details of how the reducer/transducer implementation, I'm curious what the core team think of exploring&lt;br&gt;
this further.&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7004/reducers-transducers-better-performance-resource-handling</guid>
<pubDate>Thu, 15 Sep 2016 23:04:43 +0000</pubDate>
</item>
<item>
<title>Double quote at beginning of cell throws exception</title>
<link>https://ask.clojure.org/index.php/7007/double-quote-at-beginning-of-cell-throws-exception</link>
<description>&lt;p&gt;If a cell has a double quote (an escaped quotation mark) as the first characters in the cell then an exception is thrown.&lt;/p&gt;
&lt;p&gt;For example:&lt;br&gt;
(csv/read-csv &quot;this,\&quot;\&quot;that\&quot;\&quot;,the other&quot;)&lt;/p&gt;
&lt;p&gt;produces:&lt;br&gt;
{color:red} &lt;br&gt;
Exception CSV error (unexpected character: t)  clojure.data.csv/read-quoted-cell (csv.clj:36)&lt;br&gt;
{color}&lt;/p&gt;
&lt;p&gt;but this:&lt;br&gt;
(csv/read-csv &quot;this, \&quot;\&quot;that\&quot;\&quot;,the other&quot;)&lt;/p&gt;
&lt;p&gt;produces this correct output:&lt;br&gt;
((link: &quot;this&quot; &quot; \&quot;\&quot;that\&quot;\&quot;&quot; &quot;the other&quot;))&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7007/double-quote-at-beginning-of-cell-throws-exception</guid>
<pubDate>Fri, 05 Aug 2016 02:34:06 +0000</pubDate>
</item>
<item>
<title>Port data.csv to clojurescript</title>
<link>https://ask.clojure.org/index.php/7006/port-data-csv-to-clojurescript</link>
<description>&lt;p&gt;Make data.csv available for clojurescript users.&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7006/port-data-csv-to-clojurescript</guid>
<pubDate>Mon, 11 Apr 2016 10:53:02 +0000</pubDate>
</item>
<item>
<title>Add project.clj for easier local development</title>
<link>https://ask.clojure.org/index.php/7005/add-project-clj-for-easier-local-development</link>
<description>&lt;p&gt;Add project.clj for easier local development&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7005/add-project-clj-for-easier-local-development</guid>
<pubDate>Thu, 07 Apr 2016 09:59:35 +0000</pubDate>
</item>
<item>
<title>Specify RFC4180 compatibilty in README</title>
<link>https://ask.clojure.org/index.php/7003/specify-rfc4180-compatibilty-in-readme</link>
<description>&lt;p&gt;In the README it says: &quot;Follows the RFC4180 specification but is more relaxed.&quot;&lt;br&gt;
This is an oxymoron and confusing in other regards. E.g.: &lt;br&gt;
- What does &quot;relaxed&quot; mean? &lt;br&gt;
- If it is more &quot;relaxed&quot; than the specification, how can it follow it?&lt;br&gt;
- Does it follow the specification, or only parts of it?&lt;/p&gt;
&lt;p&gt;Problem: If I use this lib to generate CSV for a third party, can I say &quot;This is RFC4180 conform CSV&quot; and feel safe with it? Or should I add &quot;but it is more relaxed&quot; :)&lt;/p&gt;
&lt;p&gt;The task could be to add more specific explanation or a comparison table if necessary.&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7003/specify-rfc4180-compatibilty-in-readme</guid>
<pubDate>Wed, 18 Mar 2015 13:09:42 +0000</pubDate>
</item>
<item>
<title>write-csv and quote? predicate</title>
<link>https://ask.clojure.org/index.php/7002/write-csv-and-quote-predicate</link>
<description>&lt;p&gt;In version 0.1.2 the quote? predicate is called after the object to be written into a cell is converted into a string (see line 99). If the predicate quote? would be applied to the object instead, function write-csv could be called as follows:&lt;/p&gt;
&lt;p&gt;(write-csv&lt;br&gt;
  &quot;test.csv&quot;&lt;br&gt;
  (link: [1 &quot;text&quot;)
   (link: 2 &quot;text&quot;)]&lt;br&gt;
  :quote string?)&lt;/p&gt;
&lt;p&gt;In the current version every cell value is a string.&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7002/write-csv-and-quote-predicate</guid>
<pubDate>Tue, 21 Oct 2014 09:02:37 +0000</pubDate>
</item>
<item>
<title>Allow read-csv to read files without quoting.</title>
<link>https://ask.clojure.org/index.php/6999/allow-read-csv-to-read-files-without-quoting</link>
<description>&lt;p&gt;I would like to be able to read file with the following format:&lt;br&gt;
  - '|' separated &lt;br&gt;
  - Unquoted.. eg. \&quot; can appear in the strings, in particular &lt;/p&gt;
&lt;pre&gt;&lt;code&gt;at the beginning, and not at the end.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I need to set a nul quote character, but this doesn't currently work.&lt;br&gt;
The following is a workaround, where a '.' is unlikely to appear in first&lt;br&gt;
character of the sting.&lt;/p&gt;
&lt;p&gt;  (csv/read-csv in-file :separator \| :quote .))&lt;/p&gt;
&lt;p&gt;I would like to be able to be explicit:&lt;/p&gt;
&lt;p&gt;  (csv/read-csv in-file :separator \| :quote nul))&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/6999/allow-read-csv-to-read-files-without-quoting</guid>
<pubDate>Thu, 29 May 2014 15:14:02 +0000</pubDate>
</item>
<item>
<title>read-csv can not handle white-space at end of line</title>
<link>https://ask.clojure.org/index.php/7000/read-csv-can-not-handle-white-space-at-end-of-line</link>
<description>&lt;p&gt;When whitespace is present after the closing \&quot; the clojure reader crashes with a weird error.&lt;br&gt;
It took me some time to notice it was a white-space issue as whitespace is .... not visible.&lt;/p&gt;
&lt;p&gt;See an example of the error below.&lt;/p&gt;
&lt;p&gt;=&amp;gt; (read-csv (java.io.StringReader. &quot;\&quot;a\&quot; &quot; ))&lt;br&gt;
Exception CSV error (unexpected character:  )  clojure.data.csv/read-quoted-cell (csv.clj:36)&lt;br&gt;
=&amp;gt; (read-csv (java.io.StringReader. &quot;\&quot;a\&quot;&quot; ))&lt;br&gt;
((link: &quot;a&quot;))&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7000/read-csv-can-not-handle-white-space-at-end-of-line</guid>
<pubDate>Fri, 24 May 2013 18:50:04 +0000</pubDate>
</item>
<item>
<title>pom.xml directives</title>
<link>https://ask.clojure.org/index.php/7001/pom-xml-directives</link>
<description>&lt;p&gt;If you build data.csv alone with the current pom.xml you get a couple of warnings and test are not executed. With the recent versions of Maven, these warnings can break the build.&lt;/p&gt;
&lt;p&gt;A fixed (I hope!) version is attached.&lt;/p&gt;
</description>
<category>data.csv</category>
<guid isPermaLink="true">https://ask.clojure.org/index.php/7001/pom-xml-directives</guid>
<pubDate>Fri, 10 Feb 2012 10:59:36 +0000</pubDate>
</item>
</channel>
</rss>