Please use accessors!

Sun Aug 1 00:40:49 UTC 1999

> ** Original Sender: "Carl Gundel" <morphic at hotmail.com>
> >From: Alan Lovejoy <sourcery at pacbell.net>
> >Reply-To: squeak at cs.uiuc.edu
> <big snip>
> >8. Given the above, I would avoid being dogmatic on this
> >issue.  I think the fact that so many good Smalltalk programmers
> >still access instance variables directly, and that code that commits
> >this "sin" has lasted unchanged for so long, should cause one to
> >at least question whether or not it's all that sinful.
> <snip>
> 
> Thanks for the brain dump Alan!  :-)
> 
> I understand the pragmatics of this issue well, but I feel compelled to make 
> a counterpoint about the above.  Code that accesses ivars directly can be 
> brittle.  In particular, code which uses both direct access and accessor 
> methods on the same ivars can behave in unpredictable ways.  Sometimes such 
> code survives for a very long time because it is brittle and hard to 
> understand, and no one wants to touch it for fear of breaking it.
> 
> I tend to defer writing code which accesses ivars directly until performance 
> tuning time.  I wish I never had to write such code because it hurt the 
> maintainability and extensibility of my design, but I live in the real world 
> (oh Bah!).
> 
> Thanks again!
> 
> -Carl

Poorly designed code will be brittle.  But I am convinced that blaming 
direct iVar access as the culprit is an oversimplification.

The programmer needs to have a firm grasp of the strategy required for
each variable.  Here are the heuristics I use (or at least, my probably
imperfect and incomplete verbal exposition of them):

	* If there are dependencies on the value of variable v1, such that 
	changes to the value of v1 require consequential action, then one
	should not assign values to v1 directly, but should use a mutation
	message.  

	* If a variable needs to be lazily initialized (perhaps because its
	value is computed by formula, or because its value should only
	be created when needed), then one should normally get the value
	using an access message.  But not always: #release, #postCopy,
	#initialize (this one is an especially interesting case, I won't go
	into it here), #unbindFromV1, #bindToV1, #invalidateV1 and
	#basicV1 are just some of the cases that come to mind where
	one might need to access v1 directly.

	* There are cases where accessor methods don't actually answer
	the value of an instance variable, but answer a copy of its value
	(to prevent unauthorized/stealth changes by outsiders to the value 
	of the object held by the variable).   If the variable really needs
	such protection, then perhaps there should not even be a private
	method that returns it (e.g., #basicV1).  In any case, there would
	be many cases where using the public "accessor" would not
	be approriate for methods internal to the class.  Note that 
	indiscriminately using access messages for all instance variables
	will lead to interesting results whenver an accessor is changed
	to a return a copy of the variable's value.  One needs to consider
	early in the design whether or not the public accessor will
	answer the variable's value itself, or only a copy thereof.

Then there is the infamous case of what I call "fraudulent conversion."
Here's an example: the instance variable is "filename."  The selector
of the accessor is #filename.  The value of the variable is (for example)
a Filename.  The accessor method answers "filename asString." (You
can invert the types, it's still "fraudulent conversion.")  Don't do this.
Instead, have accessors with selectors #filename and #filenameString
(for example).  One always answers the String, the other always
answers the Filename.  If the instance variable always contains
a String, it should be called "filenameString."  And don't let it
contain a String one moment, and a Filename the next.  Of course,
you don't need to have both #filename and #filenameString as
accessor methods if you uniformly deal only in Filenames, or
only in Strings (with respect to the instance variable and its

accessor methods).  This simply eliminates one common need
for using an accessor method (type conversion), because it's
a bad idea in the first place.  Do the type conversion when the
value of the variable is set by the mutator, or else don't name
the type coverting accessor the same as the instance variable.

The other requirement is to use abstract classes whenever and
wherever they make sense, as in the AbstractPoint/CartesianPoint/
PolarPoint case discussed previously.  Code as many methods
as possible in the abstract class, where there are no instance
variables and inappropriate instVar access cannot occur.  The
methods that remain are the ones that probably need to access
the instance variables.

Applying these heuristics results in code that will not be brittle 
due to bad instance variable usage, or so I have found in my own 
experience.

Note that the only strong case for using access messages instead
of reading the value of an instance variable directly involves
lazy initialization (assuming one has used the abstract class/
abstract method pattern appropriately).  If it would just not 
make sense to lazily initialize a variable, then why use an 
access message to get its value?  Accessing the instance
variable directly has a guaranteed semantics, does not require
the reader to browse the accessor method(s) to see what
will really happen, and will not break when the accessor
method is changed to answer a copy or type conversion
of the value of the instance variable. And it's faster, too.

Mutator methods are good because they make it easy to enforce
value constraints, ensure invariants, and take any consequential
action that may be required by a change in the value of a variable.
On the other hand, they don't have guaranted semantics, and so
require the reader to browse the code to see what the mutator
method(s) actually does (do).  It has happened more than once
that mutator methods in subclasses (that override a mutator method
in a superclass) either forget to call "super," or change the semantics 
in such a way that the code in the superclass breaks.  And they
may make it too easy for outsiders to mess with the internal state
of an object.  And they are slower. These things are two edged, 
and whether it is best to use them is very much a judgement call.

--Alan