Behavior design options (was: Behaviors vs Modules)

Wed Feb 27 13:20:03 UTC 2002

Hi all,

> Well, lets look at different designs from simple to complex:
>
>  A :=  Just single-inheritance classes.  (today's Squeak)
>  B :=  A + default methods (each selector has a default method that will
> execute if the receiver's hierarchy does not define a method for that
> selector.  Default methods cannot access instance variables, they only
> can send messages).
>  C :=  A + protocols (a protocol is a group of selectors, providing
> selector namespace.  Each selector belongs to one and only protocol.
> Selectors in different protocols can have the same name, but they are
> considered different selectors, meaning the sender has to choose which
> one he means if its not obvious).
>  D :=  A + mixins (a mixin is a set of methods that may be shared among
> different classes.  Like default methods, mixin methods cannot access
> instance variables).
>  E :=  B + C.
>  F :=  C + D.
>  G :=  B + C + D.
>  H :=  Object roles (ie. PIE).
>  I :=  Multiple-inheritance classes.
>
> I encourage people to vote for one they like the best (with an
> explanation of course) and/or add new designs to this list.

I have been thinking about these design decisions a lot recently, and the
more I think about them, the more I like having a concept that supports
explicit grouping and sharing of a related methods. This means that I would
vote for D or something that is built on top of it. In the follwing, I
discuss my reasons and explain why having default methods (B) is not really
an alternative, but probably a good addition (evtl. together with protocols
(C)).

My own programming experience (and also what I see by looking at arbitrary
OO code) shows that most of the non-trivial classes contain different
aspects which are relatively independent from each other. Each of these
aspects consists of a set of method which typically have the following
properties:

- Methods in this set are logically related and collaborate closely.
- Methods in this set are relatively independent of all the other methods in
the class.
- Methods in this set only access a subset of the instance variables of the
class.

In case of Squeak, these different aspects are sometimes indicated by
putting their methods into different method categories. As an example, let's
consider the class 'Class' (resp. 'ClassDescription'). There, we see method
categories such as 'method dictionary', 'instance variables', 'class
variables', 'inheritance hierarchy', etc. All these categories represent a
different aspect of a class and they are very much orthogonal to each other.
(E.g. changing the way a the method dictionary is managed does not affect
instance variable access). The same thing also happens in the class Morph.
Every Morph contains many different aspects: It is a visual object that can
draw itself, it contains a collection of submorphs, it has/is a player, etc.
Accordingly, a Morph contains method categories called 'drawing' (resp.
'geometry'), 'submorphs-accessing', 'player', etc.

So, if we separate the different aspects of a class anyway, why not doing it
right and having appropriate support in the language kernel? The big
limitation with grouping the related methods in method categories is the
fact that it only serves for documentation, but does not allow to use groups
of methods in a first class way (a method category can only exist as a part
of the class where it is defined, but it cannot be reused or replaced as a
whole).

Besides others, mixins (= explict first class groups of related methods)
would have the following benefits:

(1) Sharing behavior without having to inherit from a base class.
One one hand, this allows much more fine grained sharing (we can share
individual aspects rather than having to inherit all the aspects of the base
class), and on the other hand, it allows sharing things from different
sources (not possible with single inheritance).
--> A ot less code duplication!!!

(2) Switch between different implementations of an aspect.
We can replace a certain aspect of a class by another (compatible) one just
by replacing a mixin. (This is much easier than having to override all the
methods of a method category!)

(3) Cleaner designs, smaller (less cluttered) entities, explicit
documentation.
Building a class from different mixins makes a design much cleaner, because
it explicitely shows:
- What kind of aspects a class consists of
- What's the protocol that is supported by each individual aspect
- How these aspects are glued together
This makes a design much easier to understand in two ways: If you want to
look at a class very briefly, you get a good idea of what it does by just
knowing what mixins it implements, and if you want to look at it more
detailed, you get much faster to the interesting points because the methods
are better organized and mixins (at least the mixins I have in mind) would
show you in detail how the class and the mixins are glued together.

So, why do I prefer having explicit mixins over just having default methods,
which also allow sharing behavior (without inheritance)?

Default methods only allow to provide *one* sharable implementation per
selector. For some selectors, this is all that is needed (e.g. 'select:
aBlock' or 'collect: aBlock' are (nearly) always implemented the same way),
but there are other selectors which may be implemented in many different
ways. This is particularly the case if a programmer wants to create and
share different implementations of one protocol. As an example, a programmer
might want to create different policies for 'instance variable management',
'class variable management', 'method dictionary management' and would then
like to combine them freely in order to create one (or more) very specific
Class.

In my eyes, the biggest limitation of just having default methods is the
fact that there is no way of grouping them together and create first class
"behaviors". As pointed out in (2) and (3), such an explicit grouping of
methods raises the level of abstraction when we build classes, because it
allows us to compose classes from *a few* mixins (and some glue code)
instead of building them from *many* individual methods.
As a simple example, imagine that we would like to write several classes
that contain a rectangle aspect. Having mixins, we would first write a
RectangleMixin that provides all the necessary methods (such as area,
extent, etc.). (Of course, we could also generate such a mixin from the
class Rectangle). Looking at this mixin in the browser, we see that it
requires only four selectors (namely origin, origin:, corner, corner:) to
make it complete. Thus, when we import such a mixin into a class, the
browser immediatly shows that these 4 selectors are the ones that are
necessary to glue the mixin and the class together. Once these selectors are
properly defined (in case of the class Morph, we would use 'position' and
'extent' to do so), we can be sure that all the other methods of
RectangleMorph are working properly (since we already use them in other
classes). Later, when another programmer looks at our class, the browser
shows him that it contains of the mixin RectangleMixin and it also shows him
the glue code that is used to put the mixin into the class. This means that
this programmer immediatly knows that the class understand the mixin
protocol (in Squeak, these are 88 selectors) and he can see how it is glued
into the class by looking at 4 selectors. In addition, he sees that the 88
selectors from RecatngleMixin belong together and that he does not have to
consider them for debugging if there is anything wrong. (Instead, he should
focus on the glue methods).

The only negative point of mixins is the fact that there can be conflicts if
more than one mixin defines the same selector. However, if we don't want to
be too clever and use crazily complicated algorithms to disambiguate these
conflicts, I don't think that this is a real problem. In fact, I would not
implement any disambiguation rule at all and just apply either of these
policies:

a) Whenever someone wants to use two mixins with the same selector x, the
browser shows this selector in a special category "-- conflicts --" and
defines it as "self error: 'Mixin conflict!'" per default. This means that
the programmer has to resolve the conflict himself.

b) When two mixin define the same selector, they cannot be used in the same
class

In addition, I believe that such conflicts should not occur very often if
the mixins are designed and used properly. (A class should use mixins for
its different aspects, which are usually pretty orthogonal as far as their
terminology goes). Nevertheless, there would be a benefit, if we would use
protocols (C), which make all the selectors unique.

Ok, that's my opinion on this issue. Please note that I am not through with
bringing me up to date as far as PIE (roles, perspectives, etc.) goes. And
therefore, my opinion may change...

Cheers,
Nathanael

> -----Original Message-----
> From: squeak-dev-admin at lists.squeakfoundation.org
> [mailto:squeak-dev-admin at lists.squeakfoundation.org]On Behalf Of Anthony
> Hannan
> Sent: Dienstag, 26. Februar 2002 07:54
> To: squeak-dev at lists.squeakfoundation.org
> Subject: Behavior design options (was: Behaviors vs Modules)
>
>
> Nathanael <n.schaerli at gmx.net> wrote:
> > I'm really curious what kind of a mixin concept you have in mind.
>
> Well, lets look at different designs from simple to complex:
>
>  A :=  Just single-inheritance classes.  (today's Squeak)
>  B :=  A + default methods (each selector has a default method that will
> execute if the receiver's hierarchy does not define a method for that
> selector.  Default methods cannot access instance variables, they only
> can send messages).
>  C :=  A + protocols (a protocol is a group of selectors, providing
> selector namespace.  Each selector belongs to one and only protocol.
> Selectors in different protocols can have the same name, but they are
> considered different selectors, meaning the sender has to choose which
> one he means if its not obvious).
>  D :=  A + mixins (a mixin is a set of methods that may be shared among
> different classes.  Like default methods, mixin methods cannot access
> instance variables).
>  E :=  B + C.
>  F :=  C + D.
>  G :=  B + C + D.
>  H :=  Object roles (ie. PIE).
>  I :=  Multiple-inheritance classes.
>
> I encourage people to vote for one they like the best (with an
> explanation of course) and/or add new designs to this list.
>
> My opinion:
> 	Ambiguities arise in designs C and higher that require even
> more design
> to disambiguate.  Protocols require senders to sometimes use prefixes.
> Mixins require the importing class to resolve conflicts if the mixin
> implements a selector already included in the class or another imported
> mixin.  Roles and multiple-inheritance have similar issues.
> 	So I vote for B.  I like B over A because it allows reuse of methods
> outside of the single-inheritance structure and gives Squeak some
> function orientation, which is closer to reality.  80% of all selectors
> in the image have only one method definition (only 20% are polymorphic)
> [1].  The default methods can be grouped by functionality (modules) that
> often intersect many classes, but are usually executed together.
> 	However, having no class or receiver type associated with default
> methods will make it hard to figure out which receivers they are
> intended for.  Protocols would solve this problem by associating a
> "protocol type" with the receiver.  But, beside the cost of occasional
> message prefixes, it would require us to come up with many new standard
> names (protocol names) one per receiver type per module.  For example,
> the "exceptions" module would contain additional methods for blocks,
> context, and general objects.  We wouldn't want to add #on:do: to the
> existing "blockEvaluation" protocol (#value, etc.), instead we would
> want to add it to a new "blockExceptions" protocol.  This prolification
> of protocols would be overkill.  Instead, I propose we add the receiver
> as a variable in the method header, so we can name it something
> appropriate like we do with arguments.  For example, here is how I would
> define the default method #ifError:
>
> block ifError: errorHandler
> 	^ block on: Error do: errorHandler
>
> "block" is added to the front of the method header and replaces self in
> the code.  Of course this would be optional.  But if default methods
> include receiver variables then it will be easier to tell what kind of
> receiver is expected without actually tying it to a specific class or
> using protocols.  Receiver variables, like argument variables, serve as
> informal types.  Informal is a plus since formal types are overkill (as
> all Smalltalkers know).  Tools could filter or sort default methods by
> the receiver variable, and they can even trace senders and implementors
> to figure out which default methods are applicable to which classes.
> 	Default methods provide reuse of outside-the-hierarchy behavior like
> mixins, reducing the need for mixins.  So I would keep the behavior
> design simple with just functions and classes that may override those
> functions.
> 	Note, in my original post I proposed design E: default methods +
> protocols.  This was because I thought protocols could serve as modules.
>  But now I favor transactions/changesets as modules, so protocols are
> not worth their weight anymore.  (Double Note, I still haven't studied
> SmallScript or PIE yet, which may change my opinion again :-).
>
> Cheers,
> Anthony
>
> Footnotes [1]
> Calculation of % of polymorphic selectors (methods with the same
> selector are counted as polymorphic even if they are not intended to be
> polymorphic):
>   | poly all |
>   poly _ IdentitySet new.
>   all _ IdentitySet new.
>   Smalltalk allBehaviorsDo: [:cls |
> 	cls methodDict keysDo: [:sel |
> 		(all includes: sel)
> 			ifTrue: [poly add: sel]
> 			ifFalse: [all add: sel]]].
>   (poly size / all size) asFloat  "=> 0.224"
>