Why we should remove {} from Squeak

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue Oct 2 02:11:50 UTC 2001


ducasse stephane <ducasse at iam.unibe.ch> wrote:
	I developped a lot of stuff in VW and never found the need to have {}.
	
Hey, I developed a lot of stuff in C and never found the need to have	
objects.  So?  In Lisp, I never knew how much I needed backquote until
it was there.  I knew about continuations, but never thought of *using*
them until I was given a language (Scheme) that supported them.

	But I wish I could send you the kind of code you can write with
	{} believe me this is really ugly.

This is beyond dispute.  But you can also write "really ugly" code without
{}.  The problem is not {}, it is the willingness to write really ugly code.

	So some people will say that this is the case with
	everything and the discussion will end.
	
Not with good will on both sides, surely?

"It is possible to write really ugly code with construct X"
is uninformative for every value of X.

What _is_ informative is
    "Here are some examples of really ugly code written using construct
     X.  Let us write some tutorial material showing how the same effect
     can be obtained more beautifully (whether by using construct X in
     a different way or by other means entirely)."

The reason for the construct is that SOME CODE WOULD BE REALLY UGLY
WITHOUT IT.

To be honest, all _my_ uses of {} are in 
	anObject caseOf: {[...] -> [...] ...} otherwise: [...]
and all except one are used for classifying characters.

I do regard these as loud alarm bells warning me to look for a more
OOPY way of doing it.  I once had a caseOf: checking symbols, but
replaced that code fairly quickly.

When I've a small set of characters to discriminate
I don't know any clearer way to do it.  Characters aren't classes, so they
can't be subclassed.  And while I accept the OO credo of
basing dispatching on object type rather than codes, when you are parsing
a data stream, you have to deal with characters.

Eliminating _ caseOf: {...} otherwise: [_] would
 - double the size of some of my methods (I have tried it).
 - make them much harder to read by giving them inappropriate indentation.
 - make them much harder to read by hiding the fact that it is the same
   value being tested in each case.
In short, eliminating {} would result in REALLY UGLY code for me.

Not very much of it!  But then, just how much "really ugly" code that
does use "{}" are we talking about?

	I thought that Smalltalk was looking for simplicity.

If that were true we wouldn't have Morphic.

It is always wise to remember that there are at least four different groups
of people to consider when we talk about "simplicity" in programming:

  (A) The people who write the compiler.
  (B) The people who write the first drafts of code.
  (C) The people who maintain code other people wrote.
  (D) The people who try to use the code to perform useful work.

C is a textbook example of a language where the designers chose simplicity
for the (A) group at the expense of everyone else.  Ada is a textbook
example of a language where the designers chose simplicity for the (C)
group at the expense of everyone else.  APL is a textbook example of a
language where the designers chose simplicity for the (B) group at the
expense of everyone else.

The Smalltalk environment is such that *some* degree of simplicity for
the (A) group pays off very well in the form of better tools for the
(B) and (C) groups.

I'll admit this much:  it's rather startling that after loading a change
set that I _knew_ contained quite a few invocations of #caseOf:otherwise:
asking for 'senders of #caseOf:otherwise:' found only things in the system
that _weren't_ senders of #caseOf:otherwise:.

	So or we should have a way to describe macro-expansion or remove them!
	
I count the anomalous behaviour of "find senders of #caseOf:otherwise:" as
a strong argument against macro expansion in Smalltalk.  Because "find
senders" is system-wide, I am willing to tolerate a small fixed handful of
control structures being invisible to "find senders".  With Modules, we
should have a version of "find senders" that only examines a selected module
or set of modules, so that finding control structures would actually be
useful.

On the other hand, when I write #(a b) I don't care _how_ it is built;
I'm deliberately using a notation where no message sends appear.  There
are many ways it _could_ be done, and I don't want to assume responsibility
for any of them.  In the same way, {a. b} is a useful notation not only
because it is compact and readable, but because it _hides_ something I
don't care to know about, which is the exact message sends used to build
the object.

	***The next paragraph is provocative so read before replying ;)***
	
	Having private methods and attributes is useful too.  Sometimes
	I would like to do real design in Smalltalk.

Er, Squeak *has* private methods.
If "attribute = instance variable", Smalltalk instance variables are
'protected' rather than 'private'.  Budd's "Little Smalltalk" made them
strictly private.

Having strictly private instance variables would make the compiler more
complicated, but it sounds quite doable.  There'd need to be a new set of
messages for creating classes, along the lines of

	SomeClass subclass #NewClass
	    instanceVariableNames: 'a b c'
	    privateVariablenames: 'x y z' "<------ new bit"
	    classVariableNames: ''
	    poolDictionaries: ''
	    category: 'Some-category'!

Private variables would be allocated just like other instance variables,
but not made visible to subclasses.  The inspector and debugger would need
to be changed too, I imagine.
	    
For experimentation, it should be possible simply to adopt a convention
that an instance variable "privateXyz" may only be mentioned in the class
where it is declared.  That would mean a change to the compiler only.

	Having operators is useful too, I would like to have operators
	too because I prefer to type
	    2 + 3 * 6 than 2 + (3 * 6)
	and because this is MATH and MATH are right!
	
Well no.  The mathematical symbol for multiplication is &times; or "."
or empty, not "*".
As it happens, I agree that operators are useful:  Algol 68, Prolog,
Haskell, and Clean support user-definable operators.
But of course there _isn't_ any single consistent mathematical notation.
There's a reason why any mathematics text of any size has a substantial
"notation" section.  It simply isn't possible for any programming language
to be consistent with _all_ of mathematical usage.

However, C with its >30 operators (counting unary and binary - as different,
counting prefix and postfix ++ as different and so on) has definitely
proven difficult for many programmers, and Smalltalk allows a LOT more
binary messages than C has operators.

	I hope you see my point.
	
No.  By mentioning private variables (which I have argued we could add to
Smalltalk without any changes to the syntax at all) and mathematical
operators, are you offering a reductio ad absurdum?

	I like the simplicity of Smalltalk and I'm ready to pay the price to
	use convention for expressing private, dealing with operator by hand....
	I like simplicity but I do not like this #{} because this is
	against the simplicity and ad-hoc!!!
	
Against THE simplicity?  But there are at least four of them.
It would never have been added to the language unless someone thought
it would IMPROVE simplicity for group (B) or group (C).  It is genuinely
hard for me to see an array notation "list the expressions that yield
elements in the order in which the elements are supposed to be, put
"." separators between them just like the "." separators in a block,
and wrap {} braces around them" as complicated.

Indeed, as for "ad hoc", for someone familiar with Lisp, ML, Haskell,
Mercury, Clean, &c, the curly brace notation will be the most comfortingly
familiar part of the language.  {e1. ... . en} (Smalltalk),
[e1, ..., en] (ML, Haskell, Clean, Mercury), [e1; ...; en] (CAML).
The notation, while syntactically similar to Smalltalk block notation,
is structurally identical to ordered sequence notation in many other
languages.  So I think "ad hoc" is at best unfair.

	When I checked me code I can tell you that I have much more ordered
	collection than array still there is no construct.
	So I would really like to have one too!!!!!!!!

Nothing stops the compiler recognising

	{e1. ... en} asOrderedCollection

as a special case, or for that matter

	{e1. ... en} asSet

	So the real question is should simplicity be conversed or not.
	Is it worth to have caseOf: in Object and all these BraceNode in the
	compiler, why do we need that?
	
I take it that "conversed" should be "conserved".

No, the real question is "when is it OK to break other people's working
programs, given that they knew in advance that Squeak is not cast in
concrete".

	I was wondering if { could not be like @
	
	we could write Array { 1 at 1; { 1 at 2
	I'm not sure that this uglier than
	#{ 1 at 1 . 1 at 2} 
	
MUCH MUCH uglier.  Unbelievably ugly.  (Apart from the fact that it
won't work.  The second "{" will be sent to Array, so it can only
create a one element array.)  It would parse as
    (let ((a Array))
      (send (send a '|{| 1) '@ 1)
      (send (send a '|{| 1) '@ 2))
which doesn't create any Points that I could see.  It would have to
be
	Array { (1 at 1); { (1 at 2)
which would parse as
    (let ((a Array))
       (send a '|{| (send 1 '@ 1))
       (send a '|{| (send 1 '@ 2)))
which still has the problem that if the first send of #{ creates a new
array, so must the second.


There _is_ an argument against the present implementation.
It adds methods that mustn't be tampered with, when it need not add
any methods at all, and the code it generates is slow.

I have just done some benchmarks comparing brace notation such as
    {e1. ... en}
with calls to primitives
    ((Array basicNew: n) basicAt: 1 put: e1; ... basicAt: n put: en; yourself)

For n = 1 or 2, the latter version is about 10% faster.
(Including loop overhead.)
For n = 10, the latter version is twice as fast (including loop overhead).
(PowerMac G3, Squeak 3.0 patches to #3552); for large values of n 

Implementing {} calls for _some_ chumminess between the compiler and the
class library; using #basicNew: and #basicAt:put: is at least as legitimate
as using the "brace support" methods in Array.




More information about the Squeak-dev mailing list