[ANN]VerySmallTalk PhD position @ Douai

Jecel Assumpcao Jr jecel at merlintec.com
Thu Sep 15 20:28:23 UTC 2005


Noury Bouraqadi wrote on Wed, 14 Sep 2005 15:43:17 +0200
> Le 13 sept. 05, à 21:17, Jecel Assumpcao Jr a écrit :
> > And when thinking small it is always a good idea to take a look at
> > Forth, specially anything that Chuck Moore himself is working on like
> > ColorForth.
> 
> I'm not familiar with Forth. Have you any pointers to start with?

Other people have sent you a lot of good references already. Given that
the Smalltalk, Lisp and Forth communities have much the same "flavor" it
isn't surprising that many people belong to more than one. For your
particular needs it might not be as useful to download an implementation
and to play with it as a good book, like "Thinking Forth" by Leo Brodie
(now available online - http://thinking-forth.sourceforge.net/). But I
think I can quickly describe what makes Forth so small:

The most important aspect is extreme factoring. Even in Smalltalk code
tends to be far more factored (smaller methods) than other languages,
which makes things smaller but slower. Here is a small example defining
two functions ("words" in Forth jargon) and using them -

: cm 10 * ;
: sq dup * ;
3 cm sq 22 sq + . (show the area of a figure with a square of 3 cm on
each side and a square with 22 mm on each side in square millimeters)

Notice how lightweight the notation for defining a word and for invoking
it are. Making things easy is a good way to have them used more. Another
feature is that unlike other languages Forth uses an unframed stack.
There are separate stacks for data and for subroutine return addresses,
so we don't have to copy an argument from the "cm" frame into the "*"
one like we would in C -

int cm (int d) { return d * 10 };

Here I am supposing "*" is a subroutine (like in old 68000 compilers).
To pass "d" as an argument to it requires copying it because there is a
return address between the data for "cm" and the data for "*". With its
separate return stack Forth avoids such complications.

These are the "carrot" aspects which encourage extreme factoring, but
there is a huge "stick" factor as well: the reverse polish notation is
nice for small expressions but it simply doesn't scale well. So if you
give me some code with a 20 line function it might take me up to half an
hour to understand it, and I would probably have to draw some stack
diagrams to figure things out. And this would be true even if I had been
the one who originally wrote that code a while ago! Like the old joke
said "Doctor, it hurts when I do this" "Then stop doing that!". So an
extremely factored Forth programmable is the only kind of readable Forth
program and it tends to be very, very small.

The meta programming system is also very neat and helps keep the size
down, but I won't go into that except to mention that it offers some of
the good stuff of object oriented but is very simple. Full OO has been
done many times in Forth, but at the cost of losing what makes Forth so
nice, in my opinion.

When I wrote suggesting Forth, however, I wasn't thinking so much about
the language itself but about the various implementation technologies
that have been developed for it:

http://www.zetetics.com/bj/papers/moving1.htm
http://www.complang.tuwien.ac.at/forth/threaded-code.html
http://en.wikipedia.org/wiki/Threaded_code_compiler

Speaking of design alternatives, you have three choices for an embedded
Smalltalk:

A) cross development (Pocket Smalltalk, PIC Smalltalk) - development is
done on a full PC and the result might be simulated. The final code is
downloaded to the target hardware and any problems require going back to
the PC and repeating the cycle.

B) tethered development (OOVM, Spoon) - the target hardware has a
communication channel to the full PC and the development environment is
split between the two machines. The compiler, debugger and other tools
run on the PC but act on code running on the target instead of on the
PC.

C) native development (Little Smalltalk, Squeak, etc) - the target
hardware is a stand alone Smalltalk computer.

The last option normally isn't very reasonable because most target
harware don't have a good display or input device. And it also makes the
target image larger by including tools that the regular user won't need.
But it is the alternative I prefer whenever possible. Your project
description doesn't make it clear, but my impression is that you are
thinking of option B, right?

An interesting thing about "small" is that it depends a lot on the
context. For a project where I am doing a 16 bit version of Neo
Smalltalk the most cost effective memories were 512KB of Flash (only
300KB free, however) and 8MB of SDRAM. A smaller RAM would have to be
*much* smaller since even 128KB of static RAM would have been more
expensive. So an obvious solution is to store the image as compressed as
possible in the Flash and then to copy and expand it to the large SDRAM
before use. That still leaves 7/8th of the main memory entirely wasted.
We can add a frame buffer and a $0.40 color TV output so I can do native
development like I prefer. That leaves yet 6MB unused. So we can build
huge class x selector tables and then message sends will be faster and
more uniform (important for real time applications) than normal
implementations. So a system with 300 classes and 4000 symbols, which is
very large for an embedded application even with a native development
environment, would need 1.2 million table entries. We can afford to make
them 4 bytes each and that will allow us to have the actual start of the
method inside the entry instead of just a pointer. Many methods will
actually fit there (the factoring thing) and those that don't can have a
jump to the rest of the code. That is no worse than doing an indirection
through the table and many methods wouldn't need it.

Now if you show me a machine with 1MB of Flash but only 128KB of RAM,
"small" would have a rather different meaning. It would be best to have
the image in Flash in a directly usable format and have some scheme
where only objects that are actually written to get moved to the RAM.

My point is that while it would be very nice to have a single embedded
Smalltalk that could easily adapt to different requirements, this is not
how the Forth world evolved. Neither PIC Smalltalk, Pocket Smalltalk nor
even OOVM/Resilient can be used in hardware that isn't at most a tiny
variation of what they were originally developed for. Microlingua
(http://www.microlingua.com/) seems flexible enough, but that is easier
when you are still in the early development stage. We will have to see
what it is like when it is done.

-- Jecel



More information about the Squeak-dev mailing list