About KCP and automatic initialize

Wed Sep 17 03:00:06 UTC 2003

Daniel Vainsencher <danielv at netvision.net.il> wrote:

	The gist of your argument, IIUC, is that zero-argument
	initialization is often bad design, and therefore we shouldn't
	encourage it by making it easier.  Therefore, we shouldn't make
	new call initialize by default, because that does make it
	easier.

You've missed an important point:
* the change affects one of the most basic operations in Squeak
  in a way which is not only visible to every subclass of Object
  but has the potential to break working code
* the change COULD be done in a way which is NOT likely to break
  working code.

	I disagree - I believe this position doesn't do justice to two key
	distinctions.
	A. Zero argument initialization is sometimes good design, and means to
	sometimes correct ends should be provided.

Yes, but we HAVE means to the end "implement zero-argument initialisation
correctly".  Perfectly good means.  In fact we have several:

(1) Morph>>new sends #initialize already.
(2) Use lazy initialisation.
(3) Instead of using "super new initialize" send "self basicNew initialize"
    and this completely avoids the double-initialisation bug.

	The meta-level-hiding argument applies here.

One of the common beginner mistakes is trying to put #new in the instance
side instead of the class side.  I have now managed to find a version of
RB which loads (hooray HOORAY) and if there isn't a SmallLint rule checking
for #new and/or #new: on the instance side, I'll try to figure out how to
add one.

The distinction between instance methods and class methods in Smalltalk
is an important one, and it is _not_ the same as the distinction between
plain methods and static methods in Java.  Students here coming to Smalltalk
from Java expect that "constructors" will be like instance methods (as Java
constructors do not have a 'static' prefix), and it is no kindness to
conceal the existence of the class side from them.

	I agree that zero argument initialization is often bad design. Even
	ignoring correctness arguments about class invariants, "new" as a
	creation interface is not intention revealing, and also not concise, if
	it then require more method calls to configure the object for use.

Good.  We agree.  I go further:  based on my examination of Squeak,
#new as creation interface is *almost always* bad design, not just "often".

	B. Design is one of the learning curves that beginners need to
	climb, in addition to technically correct usage of of the
	language.  However, those are separate curves.

Complete agreement.

	Someone could come to Smalltalk with a good design background,
	and decide that they actually need zero argument initialization,
	and this should not be made difficult in this language.

BUT IT ISN'T DIFFICULT NOW!  Anyone who has troubled to read a Smalltalk
textbook knows how to do this.  And don't forget, someone coming to
Smalltalk with a good design background is probably familiar with some
other OO language and is *expecting* to write his or her own "constructor"
method(s).  If you tell them that the constructor method they should
define is called #initialize, you do them no kindness.

	Someone with less design knowledge might not realize at the
	outset that a zero-arg-init is bad for his case, and still reach
	that point by using the new and initialize as temporary means.

Instead of letting people struggle in deep waters, wouldn't it be better
to TELL them how to do things?

I've been teaching a 4th-year OO paper for six years now.
I've tried using Java, Eiffel, and Smalltalk.  I've come to love
Smalltalk, and the students who have been willing to try it have
come to at least suspect that it _might_ be enjoyable once you get
past the huge *library* learning curve.  Classes have never exceeded
a dozen, but it's given me some idea of what problems they meet.

"Why isn't #new working?" (because you put it on the instance side)
*has* been a common problem for my students.
"Why is #initialize called twice?" has NOT been a problem for them.
Maybe it's because of spending a lecture on how instances are/should be
created.  Idunno.

While none of my students has been willing to use Eiffel for their
project work, I've been playing with Eiffel as long as I've been playing
with Smalltalk.  Eiffel uses constructors (called creation methods).
A creation method is a perfectly ordinary method and can be called like
any other method, except that (a) it doesn't get to assume that the
class invariant is true and (b) when you create an object, you must in
the same form invoke a creation method.

The interesting thing is that Eiffel programmers don't seem to have an
equivalent of the double-initialize bug.  Now you _can_ do

    class FOO
    creation make
    feature
       x: INTEGER

       make is do x := 1 end
    end -- FOO

    class BAR
    creation make
    inherit FOO redefine make end
    feature
        y: INTEGER

        make is do Precursor; y := 2 end
    end -- BAR

but this doesn't seem to lead to trouble.  And I suspect that the reason
it doesn't lead to trouble is that there is only *one* method involved.
('Precursor' is like 'super foo', it can only be used to invoke an
ancestral version of the *same* method, not another method.)

I am no friend of superficial changes that leave the fundamental problem
untouched, and it seems to me that the really fundamental problem here is
*not* the fact that #new doesn't call #initialize, but that to create and
initialise objects, you need *TWO* methods, one on the class side (because
there is no instance yet, so it has to be something else that receives the
message) and one on the instance side (because the class does not have
unmediated access to the instance variables; only the instance can
initialise that).

In order to get zero-argument initialsation right, you don't need to
understand esoteric aspects of the language known only to experts.
What you have to understand is some fairly fundamental features of Smalltalk.

Here goes:

(A) Every message must be sent to some object.
(B) You cannot send a message to an object that doesn't exist yet.
(C) If you want to create a new object, you therefore have to send
    the creation message to some *other* object.
(D) The following ways of allocating an object exist:
    - it may be prebuilt as part of the image (true, false, nil, &c)
    - it may be constructed by special VM magic invoked through primitives;
      There Be Dragons, Stay Away, You Have Been Warned.
    - it may be allocated using #basicNew or #basicNew:
    - but #basicNew and #basicNew can only be called from inside a class.
(E) So at some point a message has to be sent to the class you want the
    new object to be an instance of, asking it to allocate and initialise
    the object.
(F) A newly allocated object has all its instance variables set to nil.
    If your class invariant is true when all the instance variables are
    nil, you may not need any initialisation at all.
(G) If your class invariant is NOT true when all the instance variables
    are nil, you need some initialisation.  If the parent class already
    arranges for all the initialisation you need, you don't have anything
    more to do.
(H) If that's not enough, which is often the case when you add more
    instance variables, you will have to see to it that the new instance
    variables are initialised too.
(I) But a class is one object, and an instance is another.
    Instance variables are encapsulated inside objects.
    There _are_ some low-level hooks that the system can use to crack
    objects open and mangle their guts, but Those Are Huge Enormous
    Extremely Angry Dragons Which Like To Destroy Images So Forget I
    Ever Told You That.  This means that if an object's instance variables
    are to be initialised, the object *itself* must do that.
(J) An object won't do _anything_ unless someone sends it a message.
    So if an object's instance variables need initialising, there has
    to be an instance-side method to do this, and a class-side method
    to call it.  Initialisation _could_ involve many methods which the
    creation method calls in some pattern, but it will normally involve
    one.
(K) We have deduced that each class needs
    - at least one class method (in the 'instance creation' category)
      for allocating, initialising, and answering a new object.
    - at least one instance method (in the 'initialize/release' category)
      for initialising the instance variables.
(L) While a class needs to *have* such methods, it may not need to *define*
    them if it can inherit them from an ancestor. 
(M) Suppose we have this pattern of classes:
    Object ()
      Foo (x)
        Bar (y z)
    Suppose also that we can initialise a Foo (and a Bar) without knowing
    anything except what kind of object is wanted.  (This is sometimes
    true, but not as often as you might think.)

    Object>>new
        ^self basicNew

    returns a new object.  When this method is invoked by a descendant
    class, it returns a newly allocated uninitialised object BELONGING
    TO THE DESCENDANT CLASS.

    If Foo is happy with x being nil, Foo doesn't need a #new method of
    its own.  The inherited one will do fine.

    If Foo needs x to be initialised (let's say it must be initialised
    to false), then we need an instance method to do that initialisation
    and a class method to call that new method.

    "In the 'initialize/release' category for Foo"
    initialiseAsFoo
	x := false

    "In the 'instance creation' category for Foo *class*"
    new
        ^super new initaliseAsFoo

    If Bar is happy with x being false and y and z being nil, it doesn't
    need a #new method of its own.  The one it inherits from Foo will do.

    If Bar needs y and z to be initialised (let's say to 0 and ''),
    then we need an instance method to do that initialisation.

    "In the 'initialize/release' category for Bar"
    initialiseAsBar
        y := 0.
        z := ''.

    "In the 'instance creation' category for Bar *class*"
    new
        ^super new initialiseAsBar

    Let's work through this.
    Bar class>>new invokes
        Foo class>>new which invokes
            Object class>>new which
                allocates a new Bar with x, y, z nil and answers it
	and then invokes
	    Foo>>initialiseAsFoo which
	        initialises x to false
    and then invokes
        Bar>>initialiseAsBar which
            initialises y to 0 and z to ''.

    Each initialiseAs{Whatever} method gets to work on an object which
    has been properly initialised as an example of the superclass, and
    finishes the initialisation for Whatever.  

(N) We _could_ have used the same name for #initializeAsFoo and
    #initializeAsBar but that would have been a really dumb thing to
    do, for two reasons.  First, it would be tricky to ensure that
    the right method was called, and second, because the two methods
    have different purpooses.  #initialiseAsFoo is supposed to set an
    uninitialised object up as a usable Foo.  #initialiseAsBar is supposed
    to set an object that has been initialised as a Foo up as a Bar.  The
    two methods have different preconditions and different postconditions,
    so they deserve different names.

(O) There might be several ways to initialise an object,
    so there might be several instance creation methods.
    Suppose for example that a Bar might be initialised with y = 0
    or with y = 1.  In that case we might have

    Bar class>>
      new
        self shouldNotImplement.
      newZero
        ^super new initialiseAsBar: 0
      newOne
        ^super new initialiseAsBar: 1
    Bar>>
      initialiseAsBar: n
        y := n.
        z := ''.

    If one of the variants was "normal" and the other "exceptional"
    we might have (Bar new) for the normal case and (Bar newExceptional)
    for the exceptional case.  If there is no strong reason to regard one
    way of initialising as more "natural" than the other, it would be
    misleading to keep on using #new because people would never be sure
    which version they were getting.  In such a case, it is a good idea
    to "cancel" the creation method you don't want, so that people don't
    invoke it by accident.

Like I say, we _already_ have perfectly good ways to do zero-argument
initialisation without *any* changes to Object>>new at all.  If people
are taught how to do it right, the #new-calls-#initialize change won't
help them at all.

Summary:
- there's an existing coding style which doesn't have the problem the
  change is supposed to fix
- you can explain that coding style to people pretty simply; if someone
  understands 'object' 'class' 'instance' 'method' and 'send' that's it.
- the change doesn't address the basic underlying cause of the problem,
  which is the need for two methods.  There really isn't much you _can_
  do about that problem, because it is part of what makes Smalltalk so
  good.  All you _can_ do is teach people simply how to get it right.