Hi all --

Chris (cmm) is arguing for a kind of "polymorphic convenience", focusing on a single operation where programmers do not care about specific properties of the collection at hand. The definition of "being a duplicate" is probably something like "a = b" and thus relying on the implementation of #=. I think that Christoph (ct) has a similar perspective here.

Vanessa (codefrau) is arguing for considering a broader context of what programmers are trying to achieve and how they think about their collections at hand. Here, "being sequenceable" seems to be somehow connected to figuring out what "being a duplicate in a container" means. Thus, it would be strange if #withoutDuplicates worked but a following "uniqueStuff first" would raise an error in certain cases. Additionally, "being a duplicate" seems to be tricky for more complex structures such as a Dictionary. There, "duplicate keys" are typically forbidden while "duplicate values" are okay. This property makes #withoutDuplicates a little bit more challenging to understand and use, depending on the kind of collection.

At the moment, programmers have to convert their collection to a sequenceable one to then be able to use #withoutDuplicates. This requires extra knowledge about the collection at hand. We have checks such as #isSequenceable for that to avoid unnecessary conversions. However, following Vanessa's train of thought, that extra knowledge would also be required when #withoutDuplicates was moved to the top. Specifically, programmers would have to think about the next operations they want to perform. Then, a conversion might be necessary anyway.

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best,

Marcel

Am 13.06.2023 05:22:53 schrieb Chris Muller <asqueaker@gmail.com>:

Hi Christoph,

I was just wondering why we only define #withoutDuplicates on SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

+1. IMO, the implementation of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

When useful abstract operations are needlessly stuck in subclasses like SequenceableCollection, that, itself, becomes a question of semantics. And of design and usability, too. Especially when one discovers they have a use for it in a non-Sequenceable, "Why," becomes the always- distracting first question. Often, people will simply re-implement something else rather than take a detour to lobby to move it up or modify the core library, thus continuing to reinforce a false notion of, "See? It's not needed except for Sequenceables.."

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely did not matter what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the implementation should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best,
Chris