Welcome! Please see the About page for a little more info on how this works.

0 votes
in data.xml by

It would be great to add NodeJS support for ClojureScript! It seems data.xml uses DOMParser, which will work in browsers but not in NodeJS.

I looked at the original issue DXML-29, and NodeJS is mentioned there (using xmldom).. but I guess support for NodeJS was never implemented in the end?

I had a look in the code and it seems xmldom is "swapped in" for DOMParser when running the ClojureScript tests, so based on that it should work? Although the tests are running on Nashorn rather than NodeJS. However I couldn't work out where xmldom is coming from.

Possible solutions:

1) Use some logic in the code, e.g. "try create DOMParser, if error try create xmldom".
2) Use xmldom everywhere (apparently it's possible to use it outside NodeJS) but I don't know enough about it. Some info here: https://github.com/jindw/xmldom/wiki/How-to-use-xmldom-in-non-node.js-JavaScript-platforms-like-Rhino-or-SpiderMonkey

Maybe option 1) is preferable.

Unknowns (at least to me): how to "export" the xmldom npm dependency so that it is available for downstream projects using the library..

9 Answers

0 votes
by

Comment made by: bendlas

For this to happen, I'd like to enable the ClojureScript implementation for streaming xml, by using a sax.js parser.

Using js/DOMParser was a concession to its overwhelming performance in browsers, for small documents. It's not something that I'd like to bring to nodejs, where there isn't a native implementation and larger documents are likely to be processed.

I'm open to get better support for working with various dom classes, which also includes java's org.w3c.dom, short of delivering an actual dom parser alongside data.xml. If you want to use data.xml in nodejs right now, consider using xmldom + clojure.data.xml/element-data directly.

0 votes
by

Comment made by: alza

Hi (link: ~bendlas)

Streaming support with lazy seqs would be great!

I'm currently using data.xml with xmldom on Node, and it's quite slow for the ~10mb XML documents I'm working with. Presumably due to the string-only parsing in xmldom. It seems to be around ~10x slower than the same code running on the JVM where streams can be used.

What are the good options for a streaming Node parser to build on top of?

sax-js doesn't seem to support streaming..

0 votes
by

Comment made by: bendlas

sax-js doesn't seem to support streaming ..

I'm not sure I understand: SAX literally stands for "Streaming API for XML" and looking at its README example, I would expect it to allow multiple .write calls with partial chunks before calling the final .close

What are the good options for a streaming Node parser to build on top of?

All the various streaming parsers I've found were built on sax-js, so I'd go for that.

Streaming support with lazy seqs would be great!

Here is the crux: Lazy Sequences in are not really an option for IO in JS, since everything is non-blocking. So, where as in java you can happily block your thread, while waiting for input in a lazy-seq's .next, you can't do that in JS. Presumably, there are options for blocking IO in Node.js, but that would be node-only and still horrible due to the single-threaded nature of node programs. This is also the reason there exists no XML Pull API (StAX) for JS.

data.xml is built on StAX, because it's a natural fit for lazy-seq's and because this was the preferred way to do stream processing in clojure back then. SAX on the other hand, is a push model which is a good fit for JS, since your program is driven by incoming IO.
Recently, clojure has gained solid support for push streams in the form of transducers. I am hammocking on the possibilities for basing data.xml on transducers, so that StAX and SAX sources could be supported uniformly. As a bonus, this has the potential of making data.xml faster, due to reducing intermediate allocation.

0 votes
by

Comment made by: bendlas

I've started work on this here https://github.com/clojure/data.xml/tree/sax

In case anybody wants to take a look and / or help

0 votes
by

Comment made by: alza

Hi (link: ~bendlas),

I was just wondering if this new push approach will allow for efficiently skipping whitespace during the parsing? Since with the current ClojureScript xmldom implementation, I have to scan through the parsed xml again to remove it, which is yet another step that slows down the parsing overall compared to the Clojure equivalent, which seems to support skipping whitespace during the parsing process with the "skip-whitespace" option (although I couldn't see this mentioned in the docs).

Thanks!

Alex.

0 votes
by

Comment made by: bendlas

I was just wondering if this new push approach will allow for efficiently skipping whitespace during the parsing?

Yes, as well as other transformations (e.g. https://dev.clojure.org/jira/browse/DXML-50)

On the sax branch, I've added a protocol called PushHandler (think variadic transducer) and added support for it to most relevant places in the code. You can see it in action here: https://github.com/clojure/data.xml/blob/ac9aa0f711861ee8152ddf18e89a18c1d3538b00/src/main/clojure/clojure/data/xml/js/push.cljs#L184

With this you can can build super-efficient transducer-style xml transformers.

Still there are many bits and pieces to do:

  • lots of cleanup and documentation and testing
  • API
  • streamline pull <-> push crossover and common architecture
  • port process.clj to this style of processing
  • Provide lazy pull interface on top of blocking nodejs streams

I'd be glad for any testing or feedback!

0 votes
by

Comment made by: alza

Hi (link: ~bendlas),

I'd be happy to help with "end user" testing of the top-level api, in fact I have a real-world app that already uses data.xml in both Clojure and ClojureScript: https://github.com/digital-dj-tools/dj-data-converter

I'd be particularly interested in the "streamline pull <-> push crossover and common architecture", since I'll need to support Clojure (pull) and ClojureScript (push) api's in the same project.

Thanks!

0 votes
by

Comment made by: bendlas

Awesome!

I've prototyped parsing on node with synchronous and asynchronous streams here: https://github.com/clojure/data.xml/commit/719af453ab2c90352cf72a84f2e42161bf3a8e49

Asynchronous streams can made into a kind of reducible - to - promise event source.
Synchronous streams even allow the same lazy-seq based parser as on JVM.

Both can already be parsed into an element tree.

I'd like to also adapt this for XMLHttpRequest - streams in the browser, update the API for it, and then it can be merged.

I'll ping back here, when that happens.

0 votes
by
Reference: https://clojure.atlassian.net/browse/DXML-60 (reported by alex+import)
...