[FIX] SortedCollectionFix-sr

Stephan Rudlof sr at evolgo.de
Fri Sep 27 13:22:41 UTC 2002


Dear Richard,

Richard A. O'Keefe wrote:
> I asked:
> 	> Why should addLast: or addFirst: be given a complete veto?
> 
> and gave implementations of #addFirst: and #addLast: which
> complain only when the operation would violate the class invariant.
> 
> Stephen Rudlof replied:
> 	The short answer
> 	----------------
> 	The current situation for SortedCollections
> 	- is inconsistent: >>addFirst: is forbidden, >>addLast: not;
> 
> That was understood and assumed in my question.
> 
> 	- semantically incorrect: using >>addLast: easily leads to a not sorted
> 	SortedCollection.
> 	
> That also was understood and assumed.
> 
> My question is not "what is broken about the present situation",
> but "why not allow these operations to proceed whenever they can
> do so without breaking the class invariant?"
> 

> To expand on this a little:  why not give programmers a way to build
> a sorted collection in linear time when they are able to provide data
> in the right order?


I'm not against this. But please don't use addLast: for this. Use something
like >>addLastAlreadyInOrder: instead. And something like
>>addAllAlreadySorted:, >>addAllProbablyAlreadySorted: or whatever for
adding whole collections.


> 	
> 	The longer answer
> 	-----------------
> 	
> 	Prenote: your suggestion is *semantically* correct. But there are
> 	psychological/practical reasons against introducing it.
> 	
> 	Why does a programmer use #addLast:? He wants to build a
> 	Collection in an explicit order.
> 	Why does a programmer use a SortedCollection? He wants to have a Collection,
> 	which automatically ensures an order of added elements.
> 	
> There are several attitudes a class designer might take.


> The attitude I prefer is
>  - operations should never be allowed to break class invariants

Agreed. (see also my responses to Andreas Raab)

>  - we should avoid introducing operations that often don't work

Agreed: this is my point regarding >>addLast: introduced for SortedCollections:
  it always works for OrderedCollections, but only in very special cases, if
applied to SortedCollections.

>  - but when we are stuck with an operation that CAN be allowed to
>    proceed some of the time we should not be strict bondage and
>    discipline nannies and stop sensible programmers doing sensible
>    things.

This is a contradiction to the previous point: 'proceed *some* of the time'
('*' by me) violates the goal to 'avoid introducing operations that often
don't work'.

> The attitude here is
>  - I am like God; I know exactly how programmers think and they mustn't.

I know what I expect from calling >>addLast:.

>    I can foresee every possible use and know which ones make sense.
>    (OK, this is a caricature, but it is a recognisable one.)
> 
> In this particular instance, it is possible for one and the same
> programmer, at a single instant in time,
> *BOTH* to want to build a collection in a particular order
> *AND* to want that collection to be a sorted collection.
> 
> For example, support I have a sorted collection of numbers,
> call it "fred":
>     fred := #(3 1 4 1 5 9 2 6 5) asSortedCollection.
> Now suppose I want to add 1 to every element.
> THIS IS AN ORDER-PRESERVING TRANSFORMATION.
> So
> 
>     jim := SortedCollection new.
>     fred do: [:each | jim addLast: each + 1].
> 
> will do the job, IF I am allowed to use addLast: when it is safe.
> (Yes, I know I can do fred+1, but the result belongs to the wrong class.)
> 


Good example!
Introduce and use something like >>addLastAlreadyInOrder: instead.


> 	Using >>addLast:  for a SortedCollection means to try to give an
> 	explicit order to something, which has an automatically 'built
> 	in' one.  This mostly is a programmer error.
> 
> I am not sure how to interpret "mostly" here.
> If it is an empirical statement about how often it does happen,
> then I would be intrigued to know where the evidence came from.


It's not an empirical statement. For me it is a convention to expect an
OrderedCollection if using >>addLast: (also see below).


> 
> In the case where the two orders AGREE, which is the only case my
> versions of #addFirst: and #addLast: allow, it is not an error.
> In the case (of as far as I am aware completely unknown frequency)
> where the two orders do not agree, the error will still be reported.
> 


> 	And it violates encapsulation:  I have to know, which built in
> 	order the SortedCollection has, to use >>addLast:  correctly
> 	along your suggestion.
> 
> No, the sorting order of a sorted collection is part of its ***PUBLIC***
> interface.  Note that SortedCollection>>sortBlock is in the 'accessing'
> category, not the 'private' category.


Agreed: 'encapsulation' is the wrong term here.

My point is: if I'd use >>addLast: I'd expect an OrderedCollection, *not* a
SortedCollection. And ANSI supports this kind of view.


> 
> 	By using >>add:  instead, there would be no problem.
> 	
> Yes there would.  There would be an *EFFICIENCY* problem.
> 
> Note that I too can claim a psychological reason why addLast: should
> be used in this case rather than add:.  The reason is that in Set
> #add: means "add only if absent", whereas whenever addLast: is allowed
> it will make the collection one element bigger.
> 
> 	So there remains the performance thing:  using
> 	SortedCollection>>add:  multiple times means, that each object
> 	has to be sorted in immediately.  It is probably more performant
> 	to have a private add method, which just adds an element, while
> 	leaving the SortedCollection in an inconsistent state, and
> 	>>reSort it afterwards, when all elements have been added.
> 
> "performant" is, to the best of my knowledge, not an English word.
> I certainly haven't been able to find it in any of the paper or
> electronic dictionaries I've just checked.  Try "faster".

I'm wondering, because Leo knows it:
http://dict.leo.org/?search=performant&searchLoc=0&relink=on&spellToler=standard&sectHdr=off&tableBorder=1&cmpType=relaxed&lang=en
But it may be wrong: you are the native speaker!

> 
> 	That has been my idea of introducing the 'bypass' method for
> 	clients which really know about the internals.

I don't like this (not exactly mine, see also my replies to Andreas Raab)
idea anymore.

> 	
> But the ordering used by a sorted collection is most definitely
> NOT a fact about its internals.  It is in some respects the most
> public thing there is about such an object.

Agreed.

> 
> Part of my point is that you don't *need* a bypass method when you
> have addLast: and could make it respect the class invariant.
> 
> 	BTW: In the ANSI standard (revision 1.9) *protocol* <SortedCollection>
> 	doesn't conform to protocol <OrderedCollection>, which is the only one
> 	defining >>addFirst: and  >>addLast:. This makes sense to me.
> 	
> The ANSI argument was shot down in smoke and billowing flames when I tried
> to apply it to #removeAll:, I don't see why it should have any greater force
> here.  Nothing in the ANSI standard prevents you *adding* methods to
> standard classes; if it did, almost all of the methods in Object would
> have to find another home.

Agreed.
But I stay at my assumption to expect an OrderedCollection if using
>>addLast:. And if you introduce methods *** with the same name *** in other
protocols you break conventions (and ANSI is also a convention).

> 	
> 	Implementation notes
> 	--------------------
> 	
> 	To the private SortedCollection>>addLastWithoutSorting: there could also be
> 	added
> 	- >>addWithoutSorting: and
> 	- >>addFirstWithoutSorting:;
> 	for - very few! - clients using the *internals*.
> 
> 	After using one of these methods >>reSort should be called to
> 	ensure consistency.  Note:  I haven't proved the implicit
> 	assumption here, that >>reSort'ing an already sorted
> 	SortedCollection is faster than adding elements in arbitrary
> 	order.
> 
> I suggest allowing methods to proceed that DON'T break the class
> invariant, and this is spurned as breaking encapsulation, even though
> the methods are concerned solely with matters in the public interface.
> 


> Instead, as a better way, Rudlof proposes adding methods that DO break
> the class invariant,

I've corrected this: but there *is* a client in danger breaking it (see
reply to Andreas Raab), since it uses internals. And I've just continued to
allow this instead of fixing this, too.

> and DON'T offer any efficiency advantage.

Haven't measured this (regarding these clients using internal knowledge
about SortedCollection).

<snipped a very interesting proposal to be handled in another reply>


Greetings,

Stephan
-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list