[BUG]Collection>>removeAll:

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Aug 29 06:10:33 UTC 2002


Allen Wirfs-Brock <Allen_Wirfs-Brock at Instantiations.com> wrote:
	I must have missed the messages where you quoted the standard,
	but I did consult the actual text of the actual standard (not a
	draft).  The words are what they are and I see how you could
	interpret them such that they allow the position you are taking.

GOOD.  Thank you very much.

	However, as a very active participant in the drafting of the
	standard, I can tell you that it was not the committee's intent
	to require this!

At least in this country, when they interpret a law, judges are required
to interpret the text of the law itself, in the light of other laws and
how courts have interpreted them and the law in question.  But they are
NOT allowed to consider Parliament's intent.

The same is generally held to apply to programming language standards,
at least that's how it has been done in the case of C and Ada.  Indeed,
there is a very clear clash between the C89 standard and the express
intent of the people who wrote it, and the text of the standard was held
to be definitive, NOT the authors' intent.

The reason for this is obvious.  People trying to implement what the
standard says, and people trying to use the standard as reference
material to determine what they can expect, have in the past been
most unlikely to have access to the authors or any other way of determining
their intent other than the text itself.

Nor can we always appeal to prior art.  Sometimes a standard extends
the coverage of things.  That's the case in ANSI C, for example.
Before the ANSI standard, malloc(0) was undefined and so was free(NULL).
In many systems, they didn't work.  The ANSI C standard said they had to.

The prior art for #removeAll: may well have been broken; although you
will search textbooks and reference manuals in vain for any hint of this.
Someone with access to the ANSI Smalltalk standard discovering this could
reasonably conclude that it was the intent of the ANSI committee that the
bug be fixed.  Indeed, the _only_ thing about #removeAll: that makes the
"fixed" interpretation implausible is returning the argument instead of
the receiver, and someone noting that the return value is unspecified in
ANSI Smalltalk could again reasonably conclude that this removed the
only barrier to an exceptionless reading.

	This is an issue we missed.

And #addAll:, and whether #hash may be applied to cyclic objects, and ...

This is another reason why we have to work with the text,
rather than with the intent of the committee.

	If somebody had brought it up in committee we would have said
	something explicitly, one way or another.  And I can assure you
	that it is extremely likely that what would have been said, is
	that the case where the receiver and the argument are the same
	object is undefined.

Oh, I believe you.  But it didn't happen.  We have to live with the
standard as it *is*, not as it might have been, and the standard as
it is does not allow any exception.  The only requirement is that
the receiver be of a type where #removeAll: is defined (the same types
where #remove:  is defined) argument be a collection.  No other
limitations are mentioned.

	In the standard "undefined" means that implementations can do
	anything they want and that programmers should not depend upon
	any specific result.  This would have been done to accommodate
	existing existing implementations (probably all of them, at the
	time) that "fail" in this case.

I was intimately involved with the Prolog standard.  (I was even invited
to be the UK editor, but my Australian employers wouldn't release my time
without reimbursement, and the LPA didn't have the money for that.)  The
Prolog standard made few if any concessions to existing implementations.
Some of the early drafts even broke fundamental operations, like insisting
that integer(X) signal an error for unbound X instead of failing.  As the
person who invented the predicate atom_chars/2 in 1984 and implemented it
in Quintus Prolog, I am still very upset that the ISO Prolog committee saw
fit to redefine it in a seriously incompatible way in the standard.

As noted above, although the ANSI C89 standard had as an explicit goal
to preserve as much existing code as possible, they still changed quite
a lot of things, and did make changes that broke code.  (For example, the
rules for mixing signed and unsigned integers changed.  I _still_ have to
deal with pre-standard code where I can't be sure which parts are affected
by that one and which are not.)

So people reading standards can never assume that the standard would not
have broken an existing implementation.

Other standards committees have taken the explicit view that they need
to worry about existing _users_ but not so much about existing
_implementations_.

I particularly note that a large chunk of Richard Harmon's work on
"ANSIfying" Squeak concerns the Date and Time classes.  The ANSI Smalltalk
committee were not content to define a minimal 'existing consensus'
standard there.  Some of Harmon's work concerns the collection classes.

Once again, a Squeaker looking at the standard _without_ access to the
committee's intent would have no reason to believe that the committee
would not have extended the coverage of a Collection method.

	The words that appear in the standard are what they are.
	However, you are using them to justify a position that was not
	intended by the collective authors.
	
Yes.  That's how you HAVE to read standards.  That's how the C, Ada,
Fortran, Pascal, Prolog, &c standards *DO* get read.  You do not,	
indeed, you must not, consider the authors' intent.  You do have to
consider the normal use of language.  You are entitled to assume a
Gricean maxim "if there were an exception they would have told me."

I don't see any reason to be bound by the intent of the committee
in a case where you tell me they didn't _have_ any intent.  As for what
they _might_ have decided had they realised that there was an issue
that needed standardising, Allen Wirfs-Brock speaks with considerable
authority.  (And also about Smalltalk in general.)

Had the committee decided as he surmises they might have,
then we would be in possession of an explicit warning "don't do that",
just as we are in fact in possession of an explicit warning "don't
expect the return value to be anything in particular."

Sadly, they didn't.  So we have to deal with the standard as it is,
not as it might have been.

Perhaps one day there will be a revised standard.  Perhaps one day
there will be different people on the committee, and they will speak for
Smalltalk programmers rather than Smalltalk implementors.  Perhaps.
Perhaps.  Until "one day", we have to deal with the standard as it is,
not as it may become.

So yes, I unashamedly interpret the ANSI Smalltalk using the same
approach that one is supposed to use for other programming language
standards.  And yes, when a commonsense reading of a specification is
something that can be cheaply implemented without exceptions or corner
cases, and _especially_ when there turn out to be several different ways
to so implement it some of which are cheaper than the existing way, I
think it perverse to adopt a reading without ground in the text itself
so that implementors can refrain from fixing their bugs.

In the case of C and Ada at least, there are continuing committees
whose job is to resolve questions about the standard.  If a vendor
and customer disagree about how to read the standard, they submit
their questiions to the "interpretation" committee, and the answer
they receive is authoritative, becoming in effect an extension of
the standard.

Is there such a thing for Smalltalk?

It doesn't have to include the original standard authors.
Some overlap is desirable, but some people who _don't_ know what was
intended so that they can more easily see what is actually there
are also important.

Is there any work on a revised standard that might include a few more
classes?  Directory access methods that would make sense for UNIX,
Windows, and MacOS would be nice.  (For Quintus I designed Prolog
interfaces that would make sense for UNIX, Windows, MacOS, VMS, and
VM/CMS.  Not all of that was implemented.  VMS and CMS have both
changed since then, making it easier, if anything.)  Maybe a priority
queue collection?

I suppose we may never get agreement on how to handle Unicode in
Smalltalk, but it's nice to dream...



More information about the Squeak-dev mailing list