About KCP and automatic initialize
Richard A. O'Keefe
ok at cs.otago.ac.nz
Wed Sep 17 03:00:06 UTC 2003
Daniel Vainsencher <danielv at netvision.net.il> wrote:
The gist of your argument, IIUC, is that zero-argument
initialization is often bad design, and therefore we shouldn't
encourage it by making it easier. Therefore, we shouldn't make
new call initialize by default, because that does make it
easier.
You've missed an important point:
* the change affects one of the most basic operations in Squeak
in a way which is not only visible to every subclass of Object
but has the potential to break working code
* the change COULD be done in a way which is NOT likely to break
working code.
I disagree - I believe this position doesn't do justice to two key
distinctions.
A. Zero argument initialization is sometimes good design, and means to
sometimes correct ends should be provided.
Yes, but we HAVE means to the end "implement zero-argument initialisation
correctly". Perfectly good means. In fact we have several:
(1) Morph>>new sends #initialize already.
(2) Use lazy initialisation.
(3) Instead of using "super new initialize" send "self basicNew initialize"
and this completely avoids the double-initialisation bug.
The meta-level-hiding argument applies here.
One of the common beginner mistakes is trying to put #new in the instance
side instead of the class side. I have now managed to find a version of
RB which loads (hooray HOORAY) and if there isn't a SmallLint rule checking
for #new and/or #new: on the instance side, I'll try to figure out how to
add one.
The distinction between instance methods and class methods in Smalltalk
is an important one, and it is _not_ the same as the distinction between
plain methods and static methods in Java. Students here coming to Smalltalk
from Java expect that "constructors" will be like instance methods (as Java
constructors do not have a 'static' prefix), and it is no kindness to
conceal the existence of the class side from them.
I agree that zero argument initialization is often bad design. Even
ignoring correctness arguments about class invariants, "new" as a
creation interface is not intention revealing, and also not concise, if
it then require more method calls to configure the object for use.
Good. We agree. I go further: based on my examination of Squeak,
#new as creation interface is *almost always* bad design, not just "often".
B. Design is one of the learning curves that beginners need to
climb, in addition to technically correct usage of of the
language. However, those are separate curves.
Complete agreement.
Someone could come to Smalltalk with a good design background,
and decide that they actually need zero argument initialization,
and this should not be made difficult in this language.
BUT IT ISN'T DIFFICULT NOW! Anyone who has troubled to read a Smalltalk
textbook knows how to do this. And don't forget, someone coming to
Smalltalk with a good design background is probably familiar with some
other OO language and is *expecting* to write his or her own "constructor"
method(s). If you tell them that the constructor method they should
define is called #initialize, you do them no kindness.
Someone with less design knowledge might not realize at the
outset that a zero-arg-init is bad for his case, and still reach
that point by using the new and initialize as temporary means.
Instead of letting people struggle in deep waters, wouldn't it be better
to TELL them how to do things?
I've been teaching a 4th-year OO paper for six years now.
I've tried using Java, Eiffel, and Smalltalk. I've come to love
Smalltalk, and the students who have been willing to try it have
come to at least suspect that it _might_ be enjoyable once you get
past the huge *library* learning curve. Classes have never exceeded
a dozen, but it's given me some idea of what problems they meet.
"Why isn't #new working?" (because you put it on the instance side)
*has* been a common problem for my students.
"Why is #initialize called twice?" has NOT been a problem for them.
Maybe it's because of spending a lecture on how instances are/should be
created. Idunno.
While none of my students has been willing to use Eiffel for their
project work, I've been playing with Eiffel as long as I've been playing
with Smalltalk. Eiffel uses constructors (called creation methods).
A creation method is a perfectly ordinary method and can be called like
any other method, except that (a) it doesn't get to assume that the
class invariant is true and (b) when you create an object, you must in
the same form invoke a creation method.
The interesting thing is that Eiffel programmers don't seem to have an
equivalent of the double-initialize bug. Now you _can_ do
class FOO
creation make
feature
x: INTEGER
make is do x := 1 end
end -- FOO
class BAR
creation make
inherit FOO redefine make end
feature
y: INTEGER
make is do Precursor; y := 2 end
end -- BAR
but this doesn't seem to lead to trouble. And I suspect that the reason
it doesn't lead to trouble is that there is only *one* method involved.
('Precursor' is like 'super foo', it can only be used to invoke an
ancestral version of the *same* method, not another method.)
I am no friend of superficial changes that leave the fundamental problem
untouched, and it seems to me that the really fundamental problem here is
*not* the fact that #new doesn't call #initialize, but that to create and
initialise objects, you need *TWO* methods, one on the class side (because
there is no instance yet, so it has to be something else that receives the
message) and one on the instance side (because the class does not have
unmediated access to the instance variables; only the instance can
initialise that).
In order to get zero-argument initialsation right, you don't need to
understand esoteric aspects of the language known only to experts.
What you have to understand is some fairly fundamental features of Smalltalk.
Here goes:
(A) Every message must be sent to some object.
(B) You cannot send a message to an object that doesn't exist yet.
(C) If you want to create a new object, you therefore have to send
the creation message to some *other* object.
(D) The following ways of allocating an object exist:
- it may be prebuilt as part of the image (true, false, nil, &c)
- it may be constructed by special VM magic invoked through primitives;
There Be Dragons, Stay Away, You Have Been Warned.
- it may be allocated using #basicNew or #basicNew:
- but #basicNew and #basicNew can only be called from inside a class.
(E) So at some point a message has to be sent to the class you want the
new object to be an instance of, asking it to allocate and initialise
the object.
(F) A newly allocated object has all its instance variables set to nil.
If your class invariant is true when all the instance variables are
nil, you may not need any initialisation at all.
(G) If your class invariant is NOT true when all the instance variables
are nil, you need some initialisation. If the parent class already
arranges for all the initialisation you need, you don't have anything
more to do.
(H) If that's not enough, which is often the case when you add more
instance variables, you will have to see to it that the new instance
variables are initialised too.
(I) But a class is one object, and an instance is another.
Instance variables are encapsulated inside objects.
There _are_ some low-level hooks that the system can use to crack
objects open and mangle their guts, but Those Are Huge Enormous
Extremely Angry Dragons Which Like To Destroy Images So Forget I
Ever Told You That. This means that if an object's instance variables
are to be initialised, the object *itself* must do that.
(J) An object won't do _anything_ unless someone sends it a message.
So if an object's instance variables need initialising, there has
to be an instance-side method to do this, and a class-side method
to call it. Initialisation _could_ involve many methods which the
creation method calls in some pattern, but it will normally involve
one.
(K) We have deduced that each class needs
- at least one class method (in the 'instance creation' category)
for allocating, initialising, and answering a new object.
- at least one instance method (in the 'initialize/release' category)
for initialising the instance variables.
(L) While a class needs to *have* such methods, it may not need to *define*
them if it can inherit them from an ancestor.
(M) Suppose we have this pattern of classes:
Object ()
Foo (x)
Bar (y z)
Suppose also that we can initialise a Foo (and a Bar) without knowing
anything except what kind of object is wanted. (This is sometimes
true, but not as often as you might think.)
Object>>new
^self basicNew
returns a new object. When this method is invoked by a descendant
class, it returns a newly allocated uninitialised object BELONGING
TO THE DESCENDANT CLASS.
If Foo is happy with x being nil, Foo doesn't need a #new method of
its own. The inherited one will do fine.
If Foo needs x to be initialised (let's say it must be initialised
to false), then we need an instance method to do that initialisation
and a class method to call that new method.
"In the 'initialize/release' category for Foo"
initialiseAsFoo
x := false
"In the 'instance creation' category for Foo *class*"
new
^super new initaliseAsFoo
If Bar is happy with x being false and y and z being nil, it doesn't
need a #new method of its own. The one it inherits from Foo will do.
If Bar needs y and z to be initialised (let's say to 0 and ''),
then we need an instance method to do that initialisation.
"In the 'initialize/release' category for Bar"
initialiseAsBar
y := 0.
z := ''.
"In the 'instance creation' category for Bar *class*"
new
^super new initialiseAsBar
Let's work through this.
Bar class>>new invokes
Foo class>>new which invokes
Object class>>new which
allocates a new Bar with x, y, z nil and answers it
and then invokes
Foo>>initialiseAsFoo which
initialises x to false
and then invokes
Bar>>initialiseAsBar which
initialises y to 0 and z to ''.
Each initialiseAs{Whatever} method gets to work on an object which
has been properly initialised as an example of the superclass, and
finishes the initialisation for Whatever.
(N) We _could_ have used the same name for #initializeAsFoo and
#initializeAsBar but that would have been a really dumb thing to
do, for two reasons. First, it would be tricky to ensure that
the right method was called, and second, because the two methods
have different purpooses. #initialiseAsFoo is supposed to set an
uninitialised object up as a usable Foo. #initialiseAsBar is supposed
to set an object that has been initialised as a Foo up as a Bar. The
two methods have different preconditions and different postconditions,
so they deserve different names.
(O) There might be several ways to initialise an object,
so there might be several instance creation methods.
Suppose for example that a Bar might be initialised with y = 0
or with y = 1. In that case we might have
Bar class>>
new
self shouldNotImplement.
newZero
^super new initialiseAsBar: 0
newOne
^super new initialiseAsBar: 1
Bar>>
initialiseAsBar: n
y := n.
z := ''.
If one of the variants was "normal" and the other "exceptional"
we might have (Bar new) for the normal case and (Bar newExceptional)
for the exceptional case. If there is no strong reason to regard one
way of initialising as more "natural" than the other, it would be
misleading to keep on using #new because people would never be sure
which version they were getting. In such a case, it is a good idea
to "cancel" the creation method you don't want, so that people don't
invoke it by accident.
Like I say, we _already_ have perfectly good ways to do zero-argument
initialisation without *any* changes to Object>>new at all. If people
are taught how to do it right, the #new-calls-#initialize change won't
help them at all.
Summary:
- there's an existing coding style which doesn't have the problem the
change is supposed to fix
- you can explain that coding style to people pretty simply; if someone
understands 'object' 'class' 'instance' 'method' and 'send' that's it.
- the change doesn't address the basic underlying cause of the problem,
which is the need for two methods. There really isn't much you _can_
do about that problem, because it is part of what makes Smalltalk so
good. All you _can_ do is teach people simply how to get it right.
More information about the Squeak-dev
mailing list
|