Progrmaming in Bytecode?

Jecel Assumpcao Jr jecel at merlintec.com
Sat Aug 3 00:13:57 UTC 2002


The first real application I ever wrote in Smalltalk was an assembler 
for the Transputer. It has a nice bytecode-like machine language and 
the variable sized problem was even worse than in Smalltalk since you 
have to add prefix instructions for every 4 bits of your operands.

It so happened that I had gone back to the university and the lab was 
really happy to have me since I was the one who had gotten them into 
Unix many years before. Now I wanted them to forget Unix and go with 
Smalltalk instead, so I needed to impress them.

An assembler like this is trivial to write (even in C) so I made it an 
"instant assembler". You typed your code on the right half of what 
looks like an output listing. In the left half you saw the hex machine 
code and it was updated as you typed in the code! Take that, you nasty 
C. This meant that when you added an instruction, any number of jumps 
might suddenly become too short. So they would be extended with prefix 
instructions and that might make other jumps too short. Deleting a line 
might allow some jumps to become shorter.

We are talking about recalculating all offsets in a fraction of a second 
on a 4.77 MHz 8086 machine (Smalltalk V). And the Transputer code was 
far larger than any Smalltalk method should ever be. Sad to say, that 
lab is still a Unix shop.

On Thursday 01 August 2002 22:49, Ian Piumarta wrote:
> On Thu, 1 Aug 2002, Swan, Dean wrote:
> >   and potentially simpler instruction decoding.
>
> Absolutely.
>
> FWIW, all of the Jitters did a pre-pass over the bytecode to convert
> everthing into a "normal" form (a total of 20 or so "abstract insns",
> all of which were the same size and had the "opcode" in the same
> place) and to eliminate the convoluted "conditional jumps over
> unconditional jumps".

It is interesting how close your SAM instructions are to the current 
bytecodes in Self (which are different from the ones described in the 
various papers).

> Like you say, this simplified compilation immensely.  Whether or not
> (or on which architectures and under which conditions) it would
> increase interpreted performance would make for a fascinating
> experiment.

Some people might want to look at the instruction set I am using on a 16 
bit machine (unfortunately too small for Squeak. I would handle 
something like Little Smalltalk just fine, on the other hand):

   http://www.merlintec.com:8080/hardware/Oliver/

The send instructions are hard to understand unless you know that I use 
full Selector Table Indexing for message dispatch. This was never 
supposed to be used in practice since the table is only 5% filled, but 
I have 8MB of RAM (couldn't find a smaller chip to buy) on a 16 bit 
machine so I could afford it, and it did make the send instructions 
take only one clock. Returns take two - I could do better but ran out 
of space on the FPGA.

> Markus has been talking about making parse trees be the "portable
> form" of compiled methods and then using a runtime compiler to
> convert them into bytecodes (for interpretation) or native code. 
> Experimenting with different formats of bytecode (or "wordcode")
> would be really easy (and lots of fun) in such a context.

How about my 1984 design for a Smalltalk VM?

   http://www.lsi.usp.br/~jecel/st84.txt

-- Jecel



More information about the Squeak-dev mailing list