About KCP and automatic initialize

Wed Sep 17 05:34:34 UTC 2003

Roel Wuyts <wuyts at iam.unibe.ch> wrote:
	No, for classes where it makes no sense you would not implement 
	initialize (meaning everything is nill-ed, as it is now). ?! I do not 
	get in the whole of this discussion why everybody assumes that as soon 
	we implement an initialize on Object, everybody ***has*** to override 
	it to provide perfect good values. This is NOT the goal of the 
	proposition.

Sorry mate, you've got hold of the wrong end of the stick.
Nobody is assuming any such thing.

The reasoning goes like this:

    Creating an instance requires a class-side method and an instance-side
    method.  This is harder than requiring one method.  It requires people
    to know about the "meta-level" (= the class side).  Not only that,
    you find yourself worrying about inheritance in *both* methods, so
    unless you think clearly about what you are doing, it's a bit too easy
    to end up with initialisers being called twice.

So far so good, we all agree about that.  But we continue:

    The causes, symptoms, and consequences of the problem have nothing to
    do with whether one uses a zero-argument initialiser or a multi-
    argument initialiser.  The *same* problem exists for any class-side
    instance creation method and its corresponding instance-side
    initialisation method.

With me so far?  OK.

    The #new-calls-#initialize proposal addresses the problem ONLY in
    the case of zero-argument construction, and then only when the
    zero-argument constructor is called #new.  It does nothing for a
    class with a zero-argument constructor that is _not_ called #new
    (which is quite a lot) or for the classes that require one or
    more arguments in their constructor/initialiser (which is even more).

I hope this is also acceptable to you.  Certainly the proponents of the
change have never claimed that it would help with anything other than #new.

Now comes the key step:

    The aim of #new-calls-#initialize is to permit a teaching strategy
    where people don't have to worry about the existence of the class side
    until fairly late; as far as they are concerned "SomeClass new" uses
    a magic keyword like "new SomeClass()" in Java.  All they have to worry
    about is defining or inheriting an #initialize method on the instance
    side.

Shall we stipulate that this _is_ a simpler model for beginners?
For anyone, come to that.

    If this strategy is successful, you'll get a generation of students
    who either don't know how to create objects by any means other than
    #new or who are unwilling to do so, because #new is what they were
    (originally) taught was the RIGHT way to do it.

    Because there _isn't_ any way to initialise most objects without
    providing some information, people who learn to create objects this
    way are being taught to expose incompletely initialised objects.
    This is a bad thing.

Does anyone want to argue that
    (Point new) x: 1; y: 2; yourself
is _better_ than
    Point x: 1 y: 2

That's objection 1.  Because the change only helps with #new and not with
any other instance creation method, you only get a payoff if you adopt a
style where you use or you teach students to use #new all (ok, ok, nearly
all) the time.  And that's bad design.  Andreas Raab's suggestion would
solve the basic problem *without* this bad consequence.

But we go on.

    If you already have code which uses #new and #initialize,
    making this change to Object will mean that your #initialize
    will be called twice when previously it was called once.

Take LintRule as an example:

    LintRule class>>
    new
      ^super new initialize

    LintRule>>
    initialize

    BasicLintRule>>
    initialize
        super initialize.
        "stuff goes here"

    BlockLintRule>>
    initialize
        super initialize.
        "stuff goes here"

At the moment, BlockLintRule new will result in BlockLintRule>>initialize,
BasicLintRule>>initialize, and LintRule>>initialize each being called once.

Change Object to do ^self basicNew initialize, and each of those methods
will now be called TWICE.  But calling initialisers twice was a problem
the change was supposed to solve, not create!    

In this case, calling the initialisers twice was fairly harmless,
and there's a simple fix (delete LintRule class>>new, and you can
also delete LintRule>>initialize).

If one of the initialize methods involved had a side effect,
it would be necessary to change LintRule class>>new to
^self basicNew initialize
instead of ^super new initialize.  However, it is necessary to hunt
down _each_ class that uses #initialize and decide what change is
necessary.

The Kernel Cleaning Project will naturally fix all the occurrences
of this problem _in the system_, the problem is that precisely because
the "super new initialize" idiom has been there to imitate, it is used
in code that it _outside_ the system and will be affected by the change
but not fixed by the KCP.

	For example, I personally strongly prefer lazy initialization
	for most of my things, so I will only override the initialize
	method occasionely, just using it to initialize variables that I
	really want set.  But I want then the flexibility to just have
	to override this initialize method and use that feature.  It
	*adds* to my comfort.  And likewise for those classes for which
	you do not want or cannot use a parameterless initialize.  Do
	not use it.  <sigh>I do not get it.  I have the impression that
	we are not making ourselves clear here....

I'm afraid your impression is mistaken.  You have made yourselves quite
clear about what the proposal is, and that's how a few people know that
they don't like it.  What you don't get is that the change, by virtue of
"fixing" one case only, encourages that case, which ought not to be done
because it leads to systematically exposing partly initialised objects
and putting responsibilities in the wrong place, *and* because it
invokes an existing method, changes the behaviour of a lot of existing
working code.

To put it bluntly, much of the code in existence that currently uses
#initialize (the "pattern" the change is supposed to recognise/support)
relies on Object *not* sending #initialize.

But there's something else.

    The claim that you can ignore the change to #new if you are not
    using a parameterless #initialize would only be true if you
    didn't call the #new inherited from Object.

    Since Object>>new hasn't been doing anything except making new
    objects, there has never been any reason for class designers to
    avoid it.  Suppose you had a class

    Object subclass: #SafePoint
	instanceVariables: 'x y'

    "accessors"
    x    ^x
    y    ^y
    "private"
    setX: anX y: aY  x := anX. y := aY. ^self
    initialize self shouldNotImplement.

    "class methods"
    new
        self shouldNotImplement.
    x: anX y: aY  ^super new setX: anX y: aY

    Using super new in a chain that bottoms out in Object class>>new
    is very common.  This used to work, because Object class>>new just
    made things.  Now plug in the change.

    KaBOOM!

    If you have "classes for which you do not want or cannot use a
    parameterless initialize" you *can't* follow Roel Wuyt's advice
    to "Do not use it".  Object class>>new will FORCE you to use it;
    every class MUST define or inherit a "safe" #initialize.

Is "once it is in the basic machinery it is *hard* to avoid" so
difficult to understand?

Of course you _can_ avoid it.

    SafePoint class>>x: anX y: aY ^self basicNew setX: anX y: aY

Just change "super new" to "self basicNew" in the instance creation
methods of the classes just below Object, and you're safe.  But be sure
you don't miss any!  And be d--n sure you never have a method called
#initialize.  And remember to look at _all_ your code.

In contrast, Andreas Raab's proposal has _no_ unhappy consequences for
any existing class (other than the classes that have to be changed to
implement it), and it deals with _all_ kinds of creation, not just
parameterless creation.