New subject: Please use accessors!

31 Jul 1999


      ...
** Original Sender: Helge Horch Helge.Horch@munich.netsurf.de
At 18:55 30.07.99 -0400, Chris Norton wrote:
...
I will admit to having used i-vars directly in the past.  At the time, it
seemed like a good way to ensure privacy, but upon reflection I decided that
this was silly.  Any developer can just add his/her own accessor.  So now my
practice is to create accessors, but label them as "private" and put them
into "private" method categories.  Since Smalltalk does not enforce privacy,
we have to assume that our fellows will adhere to our "code of conduct".
</Soapbox>
<DÈjý vu> I like the way Kent Beck reasons about it. In [SBPP] he lists
*both* "Direct Variable Access" (p.89) and "Indirect Variable Access"
(p.91) as valid coding patterns, alluding to the obvious schizophrenia, and
mentioning encapsulation and dogmatism. ([SWS] is leaning towards
indirect-only, IIRC.) In his recent [GTBS], featuring a reprint of his 1993
SmalltalkReport column "To Accessor or Not to Accessor", I found his
leading remarks from today's view most interesting: "[...] It wasn't until
I rewrote the while thing as patterns for the book [SBPP] that I realized
the key issue here is communication." (The column itself stresses the
importance of consistency in one's ways.)
His conclusive remark: "Anyway, if this one bugs you, ignore it, all except
the part about making accessors private by default."
Helge:
I am very glad you posted the Kent Beck quotes.  I was fully intending to do
the same, but for once I remembered to read to the end of the message list
before prematurely responding.
All:
I fully agree with KB's position (squarely on the fence, judging each
case on its merits).  But I would like to add the following observations:
1. Finding all references to an instance variable in a class hierarchy is
even easier than finding all senders of some message, because there
is no ambiguity with respect to which class implements the name
(when renaming methods, care must be taken to only rename those
message sends where the receiver will be an instance of a class
where the method was renamed).  Since this is so, changing the
name or structure of the slots of an object is easier than changing
its interface.  A design change that would require renaming or 
removing an instance variable would very likely also require
renaming or removing the associated accessor and mutator methods.
So the debate centers on those few cases where the accessors/mutators
would remain, but the instance variable(s) would not (e.g., the
Point example already mentioned).
2. All classes in Smalltalk can be changed by any programmer.  If an
instance variable offends thee, thou mayest rename or remove it.  Same
goes for methods.  Similarly, any programmer can add methods to a
class to access otherwise encapsulated instance varibles--and can
also use #instVarAt: and #instVarAt:put: to defeat any message-based
encapsulation of an instance variable.  However, this does not mean
that one should give up on the attempt to use method-mediated 
encapsulation of state in order to enforce invariants and/or constraints.  
If object encapsulation had no value, then OOP would be pointless.
3. Barbara Liskov, in her speech at OOPSLA '87 in Orlando, noted
that Smalltalk-style class inheritance (where any subclass has full
access to the internals of its superclasses) breaks encapsulation.
The issue here is that one source may provide a framework class
meant to be subclassed, another programmer may create a subclass,
which then breaks when the code is ported to a new version of the
framework where one or more instance variables (for example)
have been renamed or removed (perhaps due to a design change,
such as converting Point from cartesian to polar).  This is where
the issue of direct access to instance variables may bite the hardest.
On the other hand, the same issue is present with respect to
methods, whether public or private.
4. The Point example is actually bogus.  The right approach is to
have an abstract superclass with CartesianPoint and PolarPoint
subclasses.  There is no valid argument for defending against
some future wholesale replacement of the x and y instance 
variables of the CartesianPoint class.  Those instance variables 
would inherently have to be there.  One might want to rename them, 
but the same issue would exist with respect to accessor/mutator 
methods (only more strongly, given point #1 above).  The only
problem (unfortunately) is that a solution analogous to the one for 
Point/CartesianPoint/PolarPoint may not be applicable in all
cases.
5. Beware lazy initialization. It has valid uses, but there are also
invalid ones, and traps for the unwary.
The best use of lazy initialization is for "caching" variables.  A 
caching variable holds a value that can be recomputed from other 
data at any time.  If lazy initialization is used to compute then cache 
the value of a variable, then setting the variable to nil at some
random moment should not change the behavior of the program
(other than time to execute).   If setting the variable to nil at 
any random moment would be incorrect, then the variable is 
NOT a caching variable.
Another case where lazy initialization is a good thing is class
variables, but only because of the likelyhood that a class will
be filed in without the #initialize message being sent to it
(it's amazing how often this seems to happen).
Finally, lazy initialization is a necessary technique for solving 
certain transitive closure and resource usage problems (where 
fully initializing everything either never ends, or does way more 
work and/or resource allocation than is actually necessary).
One case where one would not use lazy initialization is for
setting the value of "identity variables," which serve to specify 
the identity of a domain object in its domain. "Identity" variables
are usually set once (usually when the object is first instantiated), 
and then never changed. An identity variable must be explicitly set, 
usually to a value specified by some source external to the object 
(e.g, the primary key of a domain object).  By their very nature,  
they should not be lazily initialized.  Using lazy initialization 
to set the social security number of aPerson just makes no sense, 
so this mistake is not common.  In fact, one can use the fact that
there is no good default value for a variable to spot an identity
variable.
State variables present a more difficult case. A "state"
variable is one that holds some changeable state of an
object, such as the mailing address of a Person. Lazy
initialization of "state" variables seems to work fine, 
until you need to return an object to its initial and/or 
default state.  One would like to do this by sending
the object the message #initialize, but it often turns
out that a) there is no such message, because lazy
initialization was used exclusively, or b) there is
such a message, but for a variety of reasons it does
not put the object into the desired state (either because
it was never fully implemented, or because it was not
kept in sync with the lazy initialization system).
Another problem is debugging.  When state variables 
are lazily initialized, then one can easily come to false 
conclusions about object state during a debugging
session, when "nil" doesn't really mean "nil".
But the worst problem arises when lazy initialization
causes a transitive closure and/or resource usage problem, 
instead of helping to solve one.  This is classic:  You
religiously send accessor messages to fetch state.  The
accessors use lazy initialization, and gleefully create
unneeded objects.  You asked for the object, there wasn't
one there, so it got created and retured to you by the oh-so-
helpful accessor (not!).  My favorite is #release code
that reinitializes an instance variable just so it can send
#release to its value (e.g, "self controller release").  
Don't go there.  (And this is one case where it is probably
better to access the instance variable directly).
For these reasons, it is better to have an #initialize method 
that sets all state variables to their default values, sets all 
caching variables to nil, and leaves all identity variables 
unchanged.  The #new method of the root class in the 
hierarchy should then answer "self basicNew initialize" 
(#basicNew instead of "super new," because of the 
posibility that the superclass may someday be doing the 
same thing!).
6. I like the way that Self deals with this issue.  Very
elegant.  Kudos to David Ungar, et al.  But I must caution
that Smalltalk is not Self, and that always using accessors
and mutators to access state does not change Smalltalk 
into Self, and does not provide the same benefits  Until
we change the language, we have to deal with what we
have, not with what we wish we had.
7. Whether Andres was wrong or right about direct 
instance variable access, it was not cool to come down
on him like a ton of bricks.  Andres, on the other hand,
should have simply questioned why the instance variables 
were not being accessed directly, instead of boldly asserting 
they should be.
We need to treat each other with respect and diplomacy. 
We are ambassadors for Smalltalk, and should act accordingly
when communicating in an open forum, one of whose purposes
is spreading the Smalltalk gospel.
Andres: you stepped on a hornets nest with this issue.  Don't
let it bother you.  Score it as a learning experience, and be
assured it's not the only subject that may cause a religious
war to erupt among Smalltalkers.
8. Given the above, I would avoid being dogmatic on this
issue.  I think the fact that so many good Smalltalk programmers
still access instance variables directly, and that code that commits
this "sin" has lasted unchanged for so long, should cause one to
at least question whether or not it's all that sinful.
I think this is a good issue for the language designers to deal with.  
Those just trying to code in the Smalltalk of today should consider 
the pros and cons, formulate a consistent strategy, and then stick 
with it.
9. I really hadn't intended to write such a long sermon, but the
issue is complex and non-trivial.
Have a good weekend!
--Alan

re: Please use accessors!