** Original Sender: Helge Horch Helge.Horch@munich.netsurf.de
At 18:55 30.07.99 -0400, Chris Norton wrote:
I will admit to having used i-vars directly in the past. At the time, it seemed like a good way to ensure privacy, but upon reflection I decided that this was silly. Any developer can just add his/her own accessor. So now my practice is to create accessors, but label them as "private" and put them into "private" method categories. Since Smalltalk does not enforce privacy, we have to assume that our fellows will adhere to our "code of conduct".
</Soapbox>
<DÈjý vu> I like the way Kent Beck reasons about it. In [SBPP] he lists *both* "Direct Variable Access" (p.89) and "Indirect Variable Access" (p.91) as valid coding patterns, alluding to the obvious schizophrenia, and mentioning encapsulation and dogmatism. ([SWS] is leaning towards indirect-only, IIRC.) In his recent [GTBS], featuring a reprint of his 1993 SmalltalkReport column "To Accessor or Not to Accessor", I found his leading remarks from today's view most interesting: "[...] It wasn't until I rewrote the while thing as patterns for the book [SBPP] that I realized the key issue here is communication." (The column itself stresses the importance of consistency in one's ways.) His conclusive remark: "Anyway, if this one bugs you, ignore it, all except the part about making accessors private by default."
Helge:
I am very glad you posted the Kent Beck quotes. I was fully intending to do the same, but for once I remembered to read to the end of the message list before prematurely responding.
All:
I fully agree with KB's position (squarely on the fence, judging each case on its merits). But I would like to add the following observations:
1. Finding all references to an instance variable in a class hierarchy is even easier than finding all senders of some message, because there is no ambiguity with respect to which class implements the name (when renaming methods, care must be taken to only rename those message sends where the receiver will be an instance of a class where the method was renamed). Since this is so, changing the name or structure of the slots of an object is easier than changing its interface. A design change that would require renaming or removing an instance variable would very likely also require renaming or removing the associated accessor and mutator methods. So the debate centers on those few cases where the accessors/mutators would remain, but the instance variable(s) would not (e.g., the Point example already mentioned).
2. All classes in Smalltalk can be changed by any programmer. If an instance variable offends thee, thou mayest rename or remove it. Same goes for methods. Similarly, any programmer can add methods to a class to access otherwise encapsulated instance varibles--and can also use #instVarAt: and #instVarAt:put: to defeat any message-based encapsulation of an instance variable. However, this does not mean that one should give up on the attempt to use method-mediated encapsulation of state in order to enforce invariants and/or constraints. If object encapsulation had no value, then OOP would be pointless.
3. Barbara Liskov, in her speech at OOPSLA '87 in Orlando, noted that Smalltalk-style class inheritance (where any subclass has full access to the internals of its superclasses) breaks encapsulation. The issue here is that one source may provide a framework class meant to be subclassed, another programmer may create a subclass, which then breaks when the code is ported to a new version of the framework where one or more instance variables (for example) have been renamed or removed (perhaps due to a design change, such as converting Point from cartesian to polar). This is where the issue of direct access to instance variables may bite the hardest. On the other hand, the same issue is present with respect to methods, whether public or private.
4. The Point example is actually bogus. The right approach is to have an abstract superclass with CartesianPoint and PolarPoint subclasses. There is no valid argument for defending against some future wholesale replacement of the x and y instance variables of the CartesianPoint class. Those instance variables would inherently have to be there. One might want to rename them, but the same issue would exist with respect to accessor/mutator methods (only more strongly, given point #1 above). The only problem (unfortunately) is that a solution analogous to the one for Point/CartesianPoint/PolarPoint may not be applicable in all cases.
5. Beware lazy initialization. It has valid uses, but there are also invalid ones, and traps for the unwary.
The best use of lazy initialization is for "caching" variables. A caching variable holds a value that can be recomputed from other data at any time. If lazy initialization is used to compute then cache the value of a variable, then setting the variable to nil at some random moment should not change the behavior of the program (other than time to execute). If setting the variable to nil at any random moment would be incorrect, then the variable is NOT a caching variable.
Another case where lazy initialization is a good thing is class variables, but only because of the likelyhood that a class will be filed in without the #initialize message being sent to it (it's amazing how often this seems to happen).
Finally, lazy initialization is a necessary technique for solving certain transitive closure and resource usage problems (where fully initializing everything either never ends, or does way more work and/or resource allocation than is actually necessary).
One case where one would not use lazy initialization is for setting the value of "identity variables," which serve to specify the identity of a domain object in its domain. "Identity" variables are usually set once (usually when the object is first instantiated), and then never changed. An identity variable must be explicitly set, usually to a value specified by some source external to the object (e.g, the primary key of a domain object). By their very nature, they should not be lazily initialized. Using lazy initialization to set the social security number of aPerson just makes no sense, so this mistake is not common. In fact, one can use the fact that there is no good default value for a variable to spot an identity variable.
State variables present a more difficult case. A "state" variable is one that holds some changeable state of an object, such as the mailing address of a Person. Lazy initialization of "state" variables seems to work fine, until you need to return an object to its initial and/or default state. One would like to do this by sending the object the message #initialize, but it often turns out that a) there is no such message, because lazy initialization was used exclusively, or b) there is such a message, but for a variety of reasons it does not put the object into the desired state (either because it was never fully implemented, or because it was not kept in sync with the lazy initialization system).
Another problem is debugging. When state variables are lazily initialized, then one can easily come to false conclusions about object state during a debugging session, when "nil" doesn't really mean "nil".
But the worst problem arises when lazy initialization causes a transitive closure and/or resource usage problem, instead of helping to solve one. This is classic: You religiously send accessor messages to fetch state. The accessors use lazy initialization, and gleefully create unneeded objects. You asked for the object, there wasn't one there, so it got created and retured to you by the oh-so- helpful accessor (not!). My favorite is #release code that reinitializes an instance variable just so it can send #release to its value (e.g, "self controller release"). Don't go there. (And this is one case where it is probably better to access the instance variable directly).
For these reasons, it is better to have an #initialize method that sets all state variables to their default values, sets all caching variables to nil, and leaves all identity variables unchanged. The #new method of the root class in the hierarchy should then answer "self basicNew initialize" (#basicNew instead of "super new," because of the posibility that the superclass may someday be doing the same thing!).
6. I like the way that Self deals with this issue. Very elegant. Kudos to David Ungar, et al. But I must caution that Smalltalk is not Self, and that always using accessors and mutators to access state does not change Smalltalk into Self, and does not provide the same benefits Until we change the language, we have to deal with what we have, not with what we wish we had.
7. Whether Andres was wrong or right about direct instance variable access, it was not cool to come down on him like a ton of bricks. Andres, on the other hand, should have simply questioned why the instance variables were not being accessed directly, instead of boldly asserting they should be.
We need to treat each other with respect and diplomacy. We are ambassadors for Smalltalk, and should act accordingly when communicating in an open forum, one of whose purposes is spreading the Smalltalk gospel.
Andres: you stepped on a hornets nest with this issue. Don't let it bother you. Score it as a learning experience, and be assured it's not the only subject that may cause a religious war to erupt among Smalltalkers.
8. Given the above, I would avoid being dogmatic on this issue. I think the fact that so many good Smalltalk programmers still access instance variables directly, and that code that commits this "sin" has lasted unchanged for so long, should cause one to at least question whether or not it's all that sinful.
I think this is a good issue for the language designers to deal with. Those just trying to code in the Smalltalk of today should consider the pros and cons, formulate a consistent strategy, and then stick with it.
9. I really hadn't intended to write such a long sermon, but the issue is complex and non-trivial.
Have a good weekend!
--Alan