Share your thoughts in the 2024 State of Clojure Survey!

Welcome! Please see the About page for a little more info on how this works.

0 votes
in data.zip by

I want to select the content of an XML element named "Group" which itself is in an element named "Group" using xml-zip/xml1. Instead of returning the content of the inner "Group" the outer "Group" element matches. The approach of how to select this does not work and I suspect this might be a defect.

Please see the minimal example:

`
XML:

root:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<Name>Outer</Name>
<Group>
  <Name>Inner</Name>
</Group>


(zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text)
"Outer"
`

Leiningen project with unit-tests as attachment. Run: lein test

`
$ lein test
lein test :only zip-xml-bug.core-test/parsing-group-elements

FAIL in (parsing-group-elements) (core_test.clj:34)
selecting the name of inner
expected: (= "Inner" (zip-xml/xml1-> root :Root :Group :Group :Name zip-xml/text))
actual: (not (= "Inner" "Outer"))

Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
Tests failed.
`

11 Answers

0 votes
by

Comment made by: bpeter

I found a "workaround" as you can see below and in (link: ^zip-xml-bug-descent.tgz)

`
(defn descent=
[tagname]
(fn [loc]

    (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))

(testing "selecting the name of inner using descent"

(is (= "Inner"
       (-> (zip-xml/xml1-> root :Root :Group (descent= :Group) :Name zip-xml/text)))))

`

It seems the first expression in {{tag=}} matching the element itself in the {{or}} expression is the problem in my case. I suspect it can be used to select the root element. Is there any other need for it?

`
(defn tag=
[tagname]
(fn [loc]

(or (= tagname (:tag (zip/node loc)))
    (filter #(and (zip/branch? %) (= tagname (:tag (zip/node %))))

(zf/children-auto loc)))))
`

Maybe there should be a {{self}} predicate instead?

0 votes
by

Comment made by: shilder

It is regression from 0.1.1 to 0.1.2

Commit that changed behaviour is https://github.com/clojure/data.zip/commit/c5d6ca25c128f9fe937b11505c7c9736cfa2dd9a

Simple test to check

This works in 0.1.1

`
(def nestedxml
(parse-str "

1033"))

(deftest same-nested-tags
(is (= "1" (xml1-> nestedxml :area :area text)))
(is (= "033" (xml1-> nestedxml :area :unit text))))
`

Related bug is DZIP-3

0 votes
by

Comment made by: bzg

For what is worth, I've just been hit by this regression too.
I hope a proper fix can be released soon! Thanks in advance.

0 votes
by

Comment made by: bpeter

My example does not work with 0.1.1 either, I doubt it is just a regression. @Denis Shilov maybe you want to create another ticket for this.

0 votes
by

Comment made by: pdlug

Any update on this? We're also hitting this bug.

0 votes
by

Comment made by: pdlug

This worked for us with 0.2.0-alpha2 when copying the latest implementation of {{tag=}} without the {{or}} part as Benjamin Peter suggested. I'm not sure what the best fix is here seems tricky to accommodate the previous patch to allow it to match the root, clearly matching descendants it is more frequent case so perhaps a {{root=}} predicate is preferred over introducing something like {{descendant=}} through all the xml-> matchers. Of course if there's some fix to {{tag=}} which can support both that I'm not seeing that would be best but seems tricky.

0 votes
by

Comment made by: bwstearns

Ran into the same thing just today. Posted this (https://stackoverflow.com/questions/46535423/cant-access-deeply-nested-xml-with-clojure-data-zip-xml) a bit ago, but now that I've found this I know I'm not alone/crazy.

@bpeter thanks for the workaround. Does anyone know if fixing this is making it into 0.1.2/is there anything I can do to help make that happen?

0 votes
by

Comment made by: alexmiller

Someone needs to dig in to see if there is a solution that lets people do what they want in DZIP-3 and here in DZIP-6. Does the patch here re-break the case in DZIP-3?

If so, then more work needs to be done to either find a solution that works for both or to decide whether one of these cases is not valid and shouldn't be supported, or to add something that lets you do both.

Bumping up a notch, I'd love to have someone signup to be an active maintainer for data.zip. I've helped out here on a drive-by approach, but I have no skin in this game. Given there are a bunch of (obviously) caring users here, it would be great to have help from one of you.

0 votes
by

Comment made by: skuro

I actually think this issue is concrete evidence supporting that we cannot have something that caters for DZIP-3 and DZIP-6 at the same time. If we stick for tag= to mean {quote}
current tag or child
{quote}
it's simply impossible to support out of the box nested nodes with the same tag as their parent.

It is my personal opinion that checking properties of the current node happens much less frequently than descending the tree, so that the syntax sugar for keyword predicates is best reserved for checking children.

Attached is a patch that follows the path Paul Dlug suggested and:

  • rolls back the changes from DZIP-3
  • pulls the functionality required by DZIP-3 in a dedicated {{self=}} predicate

On a side node: as much as I'd love to help maintaining the project, I'm afraid I cannot commit to any more than casual help.

0 votes
by

Comment made by: nathan

Correction for Benjamin Peter's workaround:
The bottom part of his descent function got cut off:

`
(defn descent=
"Returns a query predicate that matches a node when its is a tag
named tagname."
[tagname]
(fn [loc]

  (filter #(and (clojure.zip/branch? %) (= tagname (:tag (clojure.zip/node %))))
           (clojure.data.zip/children-auto loc))))

`

0 votes
by
Reference: https://clojure.atlassian.net/browse/DZIP-6 (reported by bpeter)
...