Welcome! Please see the About page for a little more info on how this works.

0 votes
in Collections by

See dev thread at http://groups.google.com/group/clojure-dev/browse_thread/thread/fdb000cae4f66a95.

Excerpted from Mark Engelberg's message of July 14, 2010 in that discussion thread:

It will only be possible to properly implement subseq on priority maps with a patch to the core. subseq achieves its polymorphic behavior (over sorted maps and sorted sets) by delegating much of its behavior to the Java methods seqFrom, seq, entryKey, and comparator. So in theory, you should be able to extend subseq to work for any new collection by customizing those methods.

However, the current implementation of subseq assumes that the values you are sorting on are unique, which is a reasonable assumption for the built-in sorted maps which sort on unique keys, but it is then impossible to extend subseq behavior to other types of collections. To give library designers the power to hook into subseq, this assumption needs to be removed.

Approaches:

  1. A simple way is to allow for dropping multiple equal values when subseq massages the results of seqFrom into a strictly greater than sequence.

  2. A better way is for subseq to delegate more of its logic to seqFrom (by adding an extra flag to the method call as to whether you
    want greater than or equal or strictly greater than). This will allow for the best efficiency, and provide the greatest possibilities for future extensions.

Note: subseq currently returns () instead of nil in some situations. If the rest of this idea dies we might still want to fix that.

Patch clj-428-code-v5.patch implements approach #2 above. It preserves the current behavior of returning () instead of nil.

Patch: clj-428-code-v5.patch tests in clj-428-tests-v5.patch

15 Answers

0 votes
by

Comment made by: importer

Converted from http://www.assembla.com/spaces/clojure/tickets/428

0 votes
by

Comment made by: jafingerhut

Patch clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v1.txt dated Apr 28 2013 was written by Mark Engelberg in July 2010, and was attached to a message he sent to the dev thread linked in the description. The approach he takes is described by him in that thread, copied here:


Meanwhile, to initiate discussion on how to modify subseq, I've attached a proposed patch. This patch works by modifying the seqFrom method of the Sorted interface to take an additional "inclusive" parameter (i.e., <= and >= are inclusive, < and > are not).

In this patch, I do not address one issue I raised before, which is whether subseq implies by its name that it should return a seq rather than a sequence (in other words nil rather than ()). If seq behavior is desired, it would be necessary to wrap a call to seq around the calls to take-while. But for now, I'm just making the behavior match the current behavior.

Although I think this is the cleanest way to address the extensibility issue with subseq, the change to seqFrom will break anyone who currently is overriding that method. I didn't see any such classes in clojure.contrib, so I don't think it's an issue, but if this is a concern, my other idea is to fix the problem entirely within subseq by changing the calls to next into calls to drop-while. I have coded this to confirm that it works, and can provide that alternative patch if desired.

I can also supply a patch that uses drop-while in clojure.core/subseq and rsubseq if such an approach is preferred to the one in this patch.

0 votes
by

Comment made by: jafingerhut

clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v2.txt dated Apr 28 2013 is same as clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v1.txt (soon to be deleted), except it adds tests for subseq and rsubseq, and corrects a bug in that patch. The approach is the same as described above for that patch.

0 votes
by

Comment made by: jafingerhut

Patch clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v3.txt dated May 1 2013 is the same as clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v1.txt, still with the bug fix mentioned for -v2, but with some unnecessary changes removed from the patch. The comments for -v1.txt on the approach still apply.

0 votes
by

Comment made by: jafingerhut

Patch clj-428-v4.diff is identical to clj-428-change-Sorted-seqFrom-to-take-inclusive-patch-v3.txt described in an earlier comment, except it updates some diff context lines so that it applies cleanly to the latest Clojure master as of today.

0 votes
by

Comment made by: jafingerhut

Files clj-428-code-v5.patch and clj-428-tests-v5.patch are identical to the previous -v4 attachment, except they apply cleanly to the latest Clojure master. The code and tests are in separate files since they have different authors, and it will be easier to update such patches in the future if they are kept separate.

0 votes
by

Comment made by: marco.m

Hello, any news ?

0 votes
by

Comment made by: alexmiller

As implemented, this change would be a breaking changed in Sorted, so I'd say that particular patch is a non-starter on those grounds alone.

But probably better to focus on the high level. Rather than pushing subseq and what it means, it would be better to think about some abstraction interface that captured something interesting beyond what is in Sorted. Whatever thought could be put into that would be worthwhile I think. In particular, it would be good to have an idea of 2 or 3 existing collections that could potentially benefit from this. If we can't find 2-3, then I think I would just backlog it until there's a persuasive case. In particular, this seems like there is some correspondence here between SortedSet -> NavigableSet in Java and maybe something analogous is called for.

0 votes
by

Comment made by: marco.m

Thanks Alex, really appreciate your reply and the thought you put into it :-)

0 votes
by

Comment made by: markengelberg

I believe that one important measure of a language is the degree to which its internal abstractions, used for its own data structures, are open for extension. For that reason, I have long believed this is an issue worth addressing. The assumption baked into subseq and rsubseq that Sorted collections can only have unique sorted values was surely an oversight in the initial code, one that has not only prevented priority-map from hooking into subseq and rsubseq, but also several clever data structures by Marczyk. I don't think there's something interesting beyond what is in Sorted that we need to try to understand -- the problem is that implementing Sorted's seqFrom interface is not currently sufficient for subseq and rsubseq, as they are currently defined, to provide their specified behavior. You can view this as either a deficiency in Sorted or a deficiency in subseq/rsubseq.

Marco, the latest version of clojure.data.priority-map contains a patched version of subseq and rsubseq that doesn't rely on changes to the underlying Sorted interface to fix its incorrect behavior on sorted collections with duplicate sorted values. Feel free to use that implementation with priority-map or any other similar data structures you might be building.

I can't speak for the Clojure team, but I would think that if someone were to take the initiative to do extensive benchmarking on the subseq and rsubseq in clojure.data.priority-map and establish that there is no performance hit on existing sorted Clojure data structures, then they might be more inclined to move those patched versions into Clojure's core, since there'd be a clear benefit with zero downside. (Or alternatively, if there is a performance hit with respect to Clojure's existing sorted data structures, figure out how to tweak it so there is no performance penalty, which should be easy to do). The needed changes to subseq and rsubseq were simple: where they currently change a <= subsequence into a < subsequence by calling rest to drop the first item, I merely changed them to call drop-while to drop all equal items off the front of the sequence, since there might be more than one. When this issue was first raised, core team members indicated they would rather see this fixed at the protocol level, but if that is no longer feasible, I would like to see this change to subseq and rsubseq considered. It seems like an easy way to fix this longstanding issue that prevents subseq and rsubseq from working with all data structures that fulfill Sorted's seqFrom implementation.

0 votes
by

Comment made by: alexmiller

Let me just mention that this is the first time I ever remember looking at this ticket as it largely predates my involvement in Clojure core. I know it's been years and all that and I'm trying to actually take an honest stab at thinking through this. But I'll probably ask some dumb questions as I rewalk whatever path you've been down.

I'm definitely in agreement wrt making good collection traits able to be utilized by impls both inside and outside Clojure core.

With respect to the feasiblity of changing the protocol, I think maybe you mistook my intent there. We can certainly change / extend the protocols, we just need to do it in a way that grows (leaving existing stuff in place) rather than breaks (by changing existing method signatures). My objection is to changing the signature of Sorted.seqFrom() which breaks all existing impls - that's just a non-starter. But it's totally ok to extend Sorted to Sorted2 (or whatever more meaningful name) with a new arity of seqFrom. The subseq code can then do the best thing if a coll is Sorted2 and fall back to something else if just Sorted. Various ways to do that, but you get the intent hopefully - existing stuff continues to work without change, new stuff is leveraged if available.

Backing way up, it seems like seqFrom and friends are not rich enough to cover all the things that might be desired across different data structures. I continue to think that NavigableSet has a lot of good prior thinking (not saying it's a direct answer, but it's really the same problem area) and it would be profitable to look a little more broadly at some of the other data structures that have been added in the intervening years to see what we could enable.

Really, the whole impl of subseq is kind of gross and relies on a lot of moving parts and assumptions. Maybe that whole subseq op should actually be given to the collection to the collection to do directly.

0 votes
by

Comment made by: markengelberg

The crux of the problem is that subseq allows <=, <, >=, > but seqFrom only allows <=, >= and then subseq "massages" the output of seqFrom into <, > in a way that drops up to one item, which doesn't work if there is more than one equal item. The reason I consider this a design flaw is that you can create an implementation of seqFrom that does exactly what the implementation requires (generating appropriate <= and >= subsequences) and subseq and rsubseq may still not work correctly.

Way back when, the thinking seemed to be, "Hey, this mismatch between seqFrom and subseq/rsubseq is kind of screwy. Let's let the collection do all the subseq/rsubseq logic in its implementation of seqFrom so that there is no need for massaging two possible subseqs into four possible."

It is possible to fix subseq and rsubseq to drop as many items as necessary so no change is required to the interface, but if the collection could do all the logic directly it may be possible for the collection to create <, > subsequences more efficiently than a fix at the subseq and rsubseq level.

I see your point about a Sorted2 interface.

0 votes
by

Comment made by: alexmiller

I think that is a helpful restatement of the issue. I feel like "fixing" subseq with the drop logic is just doing more papering over this disconnect when it would be more helpful (and more efficient) to allow collections to directly do what the user needs.

0 votes
by

Comment made by: alexmiller

So you could have for example a Subseqable interface and if subseq/rsubseq detect it, they just pass control to the collection.

0 votes
by
...