Design Principles Behind Smalltalk, Revisited

Tue Dec 26 13:52:26 UTC 2006

J J wrote:
> One doesn't have to look far for this.  C became popular for no other 
> reason then it was "close enough" and we could use it "right now" to 
> build systems.  The more advanced languages used resources when there 
> were no resources to use.

Paul Graham has an essay on why languages become popular:
   http://www.paulgraham.com/popular.html
In the case of both C and C++, one should not discount the wight of AT&T, 
one of the largest and most widespread and visible companies of the time 
(as it ran a telephone monopoly). Similarly, without the backing of both 
Sun and IBM, Java might well never have taken off. Clearly Smalltalk was 
much better than Java in many ways when it was released:
   http://www.oreillynet.com/ruby/blog/2006/01/bambi_meets_godzilla.html
And, before Java,. people were actively converting too it as a "COBOL for 
the 1990s"; and either C++ or Smalltalk were both so different from COBOL, 
that there was no huge difference in ease of understanding either syntax 
for COBOL programmers; in fact, Smalltalk was closer to COBOL's use of 
complete words without arbitrary abbreviations if anything.

> "The rise of worse is better" largely misses the point here.  The point 
> is: getting 80% today is infinitely better then 100% "some day".  And 
> all the rest is just the incredible weight of "backward compatibility".
> 
> And the concept of backward compatibility isn't just software and 
> hardware.  It extends to workers as well.  The average programmer is 
> just not very good (and I don't speak about the worth of the people as 
> human beings.  There are just so many in the field with no interest in 
> it other then money, which is totally ok.  It just doesn't make for good 
> programmers).  The cost of moving people who are barely keeping up from 
> C++ to Java isn't so bad.  It actually makes things simpler:  just the 
> same syntax again with much of what they didn't understand taken out.  
> But moving these folks away from a C based syntax is out of the 
> question.  And getting rid of them in favor of more talented programmers 
> would be just as out of the question.

Well, it is also true one big issue is that an Algol-like syntax with 
operator precedence (times over plus) is taught in K-12 school. That is a 
big advantage for a computer language to build on that, even as that 
precedence is arbitrary and Smalltalk is more consistent. And you are 
right on how Java seemed an easy move for C++ programmers. Of course, now 
Ruby seems an easy move for Java programmers (and much of Ruby is based on 
Smalltalk ideas), so in a matter of time, we may see Ruby developers 
making the leap to a more self-documenting and flexible syntax. :-)

Still, Smalltalk syntax was supposedly designed to be easy for kids to 
learn. It is not that hard to learn the syntax. I've helped people in 
business learn it. It takes at most  week to become proficient in it (and 
often just a day). What is hard is to learn all the libraries. But, with 
more and more programmers learning things like Java or Python or Ruby, all 
systems with rich libraries, Ruby's being almost exactly Smalltalk's in 
many ways, making the leap to a new syntax would be a minor investment 
(and one worth taking because Smalltalk syntax is more extensible and 
self-documenting than any of those other languages').

People are changing languages all the time. People have moved to Python; 
people are moving to Ruby; people have even moved to languages like Perl, 
which have much more tortured syntaxes or PHP which have much more limited 
libraries. People learned HTML out of the blue because they wanted to do 
web sites, and HTML is a much harder syntax to work in than Smalltalk's in 
many ways (though you can edit in vi and then see immediate results in 
your local web browser). So, why not people moving to Smalltalk (Squeak 
especially)?  People in Python or Perl or PHP or Ruby camps are not 
bemoaning "backward compatibility" as the reason for limited success and 
adoption. While everything you say it true, it is not true enough IMHO to 
be the main reason. What are the others and how can they be addressed to 
produce a popular free Smalltalk?

>> Again, to contrast with Python, Squeak wants to run the show, but 
>> Python plays nice with all the other free tools of the GNU/Linux 
>> ecosystem.
> 
> I keep on seeing this, but it appears largely overstated.  Java has it's 
> own VM, threads etc. as well.  And it is easier to connect to the 
> outside world in Squeak then Java, because in java you are in "your on 
> your own!" land.  In Squeak you always were so there is no need to be 
> afraid of this step if you need it.  In at least Squeak and Dolphin 
> smalltalk you can call "extern C" style functions directly from 
> smalltalk (thought in squeak you need to load FFI first).  That is at 
> least as good as any of the other languages.

True. Though there can still be a difference in "culture" of the 
communities surrounding a language. Clearly Smalltalk's (or Squeak's) 
culture is very different than Python's. I wrote something on that here, 
in terms of how the cultures of the communities relate to their histories:
   http://mail.python.org/pipermail/edu-sig/2006-December/007476.html

> And if you mean more to address the tools, well yes you *can* edit Java 
> code in vi if you really want to.  But no one really wants to.  And if 
> your interface to the language is through some program anyway, then the 
> "barrier" of the code not being on the file system disappears.

Well, there is a bigger difference here between Python (which I mentioned) 
and Java (which you mentioned). Python plays nicer with UNIX-y systems 
than Java in many ways, mostly because Python is smaller, historically had 
a faster startup time, and earlier had more comprehensive libraries for 
interfacing with UNIX-y libraries. My point was more for Python, which is 
being billed as a "glue" languages -- something to glue together your C 
libraries.

Java is different, as you point out. However, Java is so different, and 
received so much attention, and incorporated so many Smalltalk-pioneered 
ideas in the JVM design and class libraries (Swing) that ten years after 
it has been introduced, it finally mostly works right as a self-contained 
environment. Not quite VisualWorks, but darn close in many ways by now, 
and it is free as in beer and is becoming free as in freedom (GPL). :-)

But for both Java and Python, being able to be easily edited in vi (or 
emacs) or being able to use a conventional text oriented version control 
system were indeed big wins, as they reduced the learning curve and 
initial commitment to new ideas. Being able to use the familiar file 
manager to look at code was also of value. And going beyond vi, the fact 
that Java IDEs started to look like C++ IDEs was another big win on 
familiarity. And seeing each class in a separate file in the good old 
reliable file system was also comforting -- at least you knew where your 
source is, and could use grep or other tools to search and manipulate it 
and back it up in a familiar fashion. Talks2 shows this is possible -- 
having a directory of Smalltalk class files. It is possible to generate 
text files from an image -- any Smalltalk can typically export such 
classes. And it isn't that hard to export instances as text either (I made 
something in Python that does it for instances in that language; any 
Smalltalk could do much the same) which gives you an image defined by 
textual program code to rebuild a world of objects.

>> Forcing everyone to work in Smalltalk using Smalltalk tools, as good 
>> as they are, means that other innovations developed in other languages 
>> with other tools, for example, Java, are lost to the Squeak community.
> 
> 
> Um... What innovations in Java?

Extensive tested and debugged libraries on a variety of topics.

>> Many humans become fluent in multiple human languages and their 
>> accompanying cultures, which is typically a harder thing than learning 
>> new computer languages. If one needs to switch mental gears 
>> conceptually to work on the VM, then is it *really* so bad if the VM 
>> is written directly in C like GNU Smalltalk does?
> 
> Typically?  It is harder in every case, no matter how badly designed the 
> programming language.

Well, Spanish to Portuguese might be easier than COBOL to OCaml? But COBOL 
to OCaml is hard for different reasons than syntax. :-)

> And what do you want to gain here?  If the squeak community came out 
> today and said "Ok!  You can write the squeak VM in anything you want, 
> we don't care", they wouldn't suddenly get volunteers knocking the doors 
> down to work on squeak.  They would only lose people who can work on the 
> VM today (not because these people *can't* do it, but because they 
> wouldn't want to anymore).
>
 > While I agree that squeak is not required to be written in a subset of
 > smalltalk, it *is* and changing it wont gain anything.  Getting squeak
 > to run on strong talk might, but I haven't seen anyone forbidding that.

My point here wasn't that Squeak should change; it was just an example of 
how being different and staying entirely in Smalltalk might not have been 
a big win, compared to just having a VM written in, say, C. There remains 
the "conceptual" barrier of the VM domain, even as the "technical" one of 
syntax is removed.

I am not against translating a VM from an abstract representation, in, say 
Smalltalk. I think it is a clever idea, especially since it already has 
been done. And with some more work, it might even gain the elegance of say 
ANTLR's plugin for Eclipse, or ANTLRWorks, where you can step through the 
abstraction in an IDE without seeing the underlying code (Java in ANTLR's 
case). (Maybe Squeak can already do this by now?)

Still, having said that, a Smalltalk VM is so simple, consider this 47K 
Public Domain one that does most of the work (from the Java version of A 
Little Smalltalk, now called SmallWorld):
  http://budd.eecs.oregonstate.edu/~budd/SmallWorld/Source/SmallObject.java
so how hard is that to maintain a Smalltalk VM the original Java?

Translating primitives into C or Java, like for sound manipulation, seems 
like a bigger win. But even then, you have to be writing that code (or 
rewriting that code) in such a non-Smalltalk way semantically that it is 
still not clear to me if there is a lot of value in it. Especially when 
the alternative might be to just call an existing sound synthesis library 
written in Java or C. We now have Java for a good cross-platform language 
with equivalent to C++ performance, so it would have been a harder choice 
ten years previously  as to what cross-platform language to use if not C 
with all its quirks (Free Pascal?).

>> Right now, I think Squeak on the JVM, like Talks2 is a step towards, 
>> could be a really big win for the Squeak community, and translating 
>> the VM from an abstract representation (in Smalltalk) to a specific 
>> language is a big win there. Still, the VM could have been in any 
>> translatable abstraction (XML, Lisp, ANTLR parse tree from a 
>> VM-specific language, Parrot, etc.) and generating Java would still be 
>> easily doable (though of course Smalltalk encoding is preferable for 
>> Smalltalkers).
> 
> Java isn't the end-all/be-all here.  Microsoft is moving to a more 
> dynamic VM already, and because of this Java will be forced to as well.  
> Java has always been behind pre-existing technologies and this area is 
> no different.  If you want to move into the future it is best not to 
> follow a group that is always behind.

The value of Squeak on Java is a separate issue. The value is mostly to be 
able to reduce deployment overhead, especially for systems that mix 
Smalltalk and faster native-y code written in Java or another JVM 
language; Talks2 already did a lot of this work.

But here again is an issue of culture. Who cares if Sun is "behind"; or if 
Squeak runs 30% slower without some extra dynamic dispatch opcode in the 
JVM? Speed is not Squeak's main problem. Being able to leverage Sun's JVM 
and the fact that you can call AWT classes in the same way for any 
platform Java runs on is a big win for Squeak IMHO, as it would reduce the 
maintenance burden of it in terms of complexity of the common code base, 
and would also make it easy to install one common package for any platform 
Java runs on. Ten years ago, or even five, I myself would have laughed at 
the value of this idea (as Java was so buggy and unstable and slow). But 
most of the bugs have been fixed, the 1.5 JVM shares memory across JVMs 
and does dynamic translation for speed, so Java finally, now that it is 
going free under the GPL, has the potential to be a great cross-platform 
tool where you get both a common base GUI window system as well as the 
ability to deliver fast primitives written in Java, as well as access to a 
lot of libraries someone else has already written and debugged for you.

The Squeak community could admit that it would be a big win to leverage 
that "pink plane" success, even if it is "behind" and decide to move 
forward on top of it, but in other "blue plane" directions. Or it can 
continue to spend a lot of time dealing with time consuming basic issues 
relating to packaging and testing C code for lots of platforms (which 
essentially just duplicates the work the Java community is doing, but not 
as well because of more limited people power).

dot net is a non-starter because it is proprietary (and may be covered by 
patents). And I would not make this suggestion without basing it on Sun's 
move to the GPL for Java. There are several JVM Smalltalk already of course.
   http://www.robert-tolksdorf.de/vmlanguages.html
But none have the power of Squeak. And, building on Squeak's strengths, it 
could be an opportune time to also shake off licensing problems, say by 
carefully comparing with and using GNU Smalltalk code when possible, or by 
using an approach like Bistro to leverage Java libraries temporarily until 
replacement versions in Smalltalk could be written in a true "clean room" 
fashion.

But the bigger point, along the lines of this main "revisited" thread, is 
that building on others work in a comprehensive way, like having a Squeak 
on top of Java, even though it has been done somewhat with the excellent 
Talks2, is something that goes against the grain of the community (and 
quite possibly to its disadvantage).

Python, by contrast, runs on the JVM, using Jython, and has great 
integration with Java. It has issues, and lags the main release, but 
overall it is production quality (at least in earlier releases); and since 
Java is such a difficult language to develop in because it is so verbose 
with braces and passing through exceptions and types and such, Jython may 
well be the big thing that makes Java continue to succeed. :-)
From:
   http://www.jython.org/Project/index.html
" Jython, lest you do not know of it, is the most compelling weapon the 
Java platform has for its survival into the 21st century:-)  —Sean 
McGrath, CTO, Propylon"

Why not have Squeak in that role too? But the deeper question is, why is 
it not there already, and why has, say, Talks2 not gotten more effort 
behind it? And I think that issue has to do with community issues and also 
licensing issues than technology issues. (I myself would build on Talks2, 
right now except it is stuck in the same licensing ambiguity Squeak is; 
I'm hoping when Squeak gets that cleared up for itself, that Talks2 might 
follow).

>> == objects are an illusions, but useful ones ===
 > [snip]
>
> To me this was the most insightful point in the whole essay.  Though, 
> honestly I thought this was pretty well understood.  Object Orientation 
> is simply a way of organizing code in a way that makes sense from the 
> perspective of the problem domain it is related to.  But since 
> programming is a task of managing complexity, correct organization is a 
> critical piece of the puzzle.

When one thinks deeply about this, perhaps your point about organization 
is the big missing piece of the puzzle. Yes, you are right, people build 
models of systems with objects, and should admit those models are 
imperfect. But there is no formal support for this process in the 
environment, or between people, other than using basic Smalltalk tools 
(Browser, Debugger, maybe Refactoring tools). Well, I guess you could use 
one of the formal OO modelling approaches, like CRC cards, but even that 
is oriented to getting one model -- not to managing a variety of possible 
representations to be used simultaneously as appropriate. Perhaps a next 
generation of OO systems needs to explicitly support this process somehow. 
How, I do not know. I just have the question here, not the solution. I do 
think having objects point to a context or world is perhaps a start, and I 
did that in a couple frameworks I have made in either Python or Smalltalk.

> But this observation is the reason OO databases haven't really taken 
> off:  An OO database will tend to model things how *your* application 
> wants to see them.  A traditional relational DBA will model things in 
> the most generic way he can so that *all* the applications can build the 
> view they need easily.  Relational DBA's tend to be of the view point: 
> The data will exist for the life of the company, while the applications 
> that access it come and go like the tide.  And one only needs to look at 
> the huge Java rewrites going on to know they are right.

Good point.

>> Consider Dan's statement of "A computer language should support the 
>> concept of "object" and provide a uniform means for referring to the 
>> objects in its universe." That appears to me to have made a classical 
>> mistake of thinking the universe has only one parsing into one object 
>> hierarchy and that the objects exist in some sort of Platonic ideal.
> 
> 
> Actually I think this applies more to C++ derived OO languages (e.g. 
> Java).  It is those languages that have huge hierarchies of things that 
> are not that related due to the brain-dead typing systems.  In smalltalk 
> the only hierarchy that has to be is inheriting from Object.  And you 
> don't even have to do that.
> 
> But I think this works just fine:  We are choosing to code something, so 
> we have to model it in the point of view appropriate to how we are going 
> to solve the problem.  And this implies some organization technique.  
> And among organization techniques, (correct) OO has had the most success 
> in my opinion.

All true, but if you look at how people teach OO, and how people talk 
about it, especially n  Smalltalk circles, I think the community and its 
culture is somehow at odds with a greater flexibility. It's hard for me to 
detail this precisely; it more has to do with tone. Certainly I like the 
Smalltalk approach; it is just not enough.

>> <chair part snipped> There is no neat "class" to put it in.
> 
> I wouldn't expect it to be in a class.  I would expect classes to know 
> how to stick to each other. :)
> 
>> Clearly you, the reader, can think about this entity so the human 
>> brain supports this fluidity in changing our definitions of objects 
>> and not requiring a one-to-one mapping to ideal classes to think about 
>> them, but a computer language like Smalltalk would have many problems 
>> representing this. William Kent, in the book _Data & Reality_ 
>> discusses these sorts of problems at length.
> 
> I disagree with where the focus is placed here.  An entity typically 
> does have just one name and would make sense to be called one thing in 
> the system.  What you are describing sounds more like interface 
> protocols.  This might be an area that could use more research, but 
> honestly I would want to know what is bought by formalizing this 
> existing practice more (e.g. making protocols first class objects 
> themselves or something).
> 
> For an example of what I mean, in case it isn't that clear, we could 
> think about Lists.  They have a collection protocol: a series of 
> messages that conform to what other collections can do.  But they could 
> also have a "stack" protocol: a series of messages for treating the list 
> as though it were a stack.
> 
> This could be seen as what we do in real life.  Due to necessity I may 
> find myself driving a nail into a piece of wood with a screw driver.  
> But I would never call what is in my hand a hammer.  I would simply be 
> using it's "blunt object" interface momentarily.

That is one of the reasons I have been attracted to Prototypes, which 
attempt to address issues like:
   http://en.wikipedia.org/wiki/Self_programming_language
"Experience with early OO languages like Smalltalk showed that this sort 
of issue came up again and again. Systems would tend to grow to a point 
and then become very rigid, as the basic classes deep below the 
programmer's code grew to be simply "wrong". Without some way to easily 
change the original class, serious problems could arise"

But prototypes have other problems, the biggest being difficulty being 
self-documenting the way classes are. As someone put it to me, "if you 
want to share something, you probably have to name it".

Formalizing protocols is one possible idea. Dan mentions it in his paper.

I thin the solutions to this issue lie in deeper directions. Classes or 
instances or prototypes could be building blocks, perhaps, but we could 
use other abstractions and better tools somehow. What these are, I do not 
know for sure. Still, like Bill Kent, I think these may lie in the 
direction of being able to model "relations" somehow.

--Paul Fernhout