CPU running smalltalk bytecode

David Simmons David.Simmons at smallscript.com
Sun Feb 10 22:31:04 UTC 2002


> -----Original Message-----
> From: squeak-dev-admin at lists.squeakfoundation.org [mailto:squeak-dev-
> admin at lists.squeakfoundation.org] On Behalf Of Scott A Crosby
> Sent: Sunday, February 10, 2002 9:06 AM
> To: Bergel Alexandre
> Cc: squeak-dev at lists.squeakfoundation.org
> Subject: Re: CPU running smalltalk bytecode
> 
> On Fri, 8 Feb 2002, Bergel Alexandre wrote:
> 
> > I have not work at a such low level, but I have some questions. VMs
for
> Java, Smalltalk, OCaml seems to be almost the same. Each one contain
> features for :
> > 	-stack based
> > 	-Garbage Collector
> > 	-lookup method
> >
> > Also, I have heard Lex talking about a CPU able to run Smalltalk
> platform.
> > Is it possible to have more information about it ? Perhaps next
> generation of CPU (Motorola, Intel, AMD, ...) could include such
> features...
> 
> Unlikely. Smalltalk, and the others, are based around a stack machine.

That is not entirely correct. There is at least one register based
Smalltalk implementation. Version 1-3 of the AOS [Agents Object System]
VM platform were stack based with corresponding opcodes. Version-4 which
hosts SmallScript, is register based with corresponding opcodes. 

It is actually quite a bit more efficient to be register based and it is
easier to map from register based to stack based than vice versa.

I.e., the pluggable JIT architecture in the v4-AOS VM includes a jitter
for cross-jitting from v3-AOS to v4-AOS that was used to bootstrap the
compiler. In other words, it jits v3's stack-based opcodes to the v4
register based opcode set. The two opcode sets are completely different.
The term opcode refers to what classic Smalltalk would call a bytecode,
but given my EE background and the AOS VM design, opcode is the correct
term.

There are many reasons for being register based, only some of which are
about performance. Not the least of which is that it allows the method
calling conventions to conform to standard rules for a given platform.
Which makes it easy to write methods in C/C++/Asm etc. Which is how the
SmallScript engine allows you to directly write C/C++/Asm inline within
a method body [including a simple macro mechanism to write method
invocations in a C++ style, and declarative access to any Smalltalk
literal form]. I.e., SmallScript has *no* primitives.

> This is fundamentally different from being based around a
> register-register machine like conventional CPU's. Register-register
CPU's
> are simpler to optimize and support superscalar operations and
multiple
> dispatch, instruction reorder, and pipelining. (I've also seen some
> research claiming that many of these same optimizations are possible
on
> a stack machine hardware.)
> 
> Custom CPU's are sorta not in vogue anymore; usually by the time a
> year/two go by during the design, the advantages of custom hardware
are
> more than overcome by Moore's law doubling what conventional CPU's can
do.

I've had a few discussions with Intel's processor design group in quite
a recent timeframe :). They are, in fact, waiting on me to provide them
with the SmallScript VM and participate in simulation testing. Custom
CPU's are not the way to go. But getting specific features added to
standard cpu's is another thing entirely. Scripting languages and
dynamic languages share the same requirements for dynamic typing
functionality. 

With just a few instructions, that require very little silicon, there
are quite a few things that can be done to enhance performance of
tag-bit pointers and dynamic dispatch mechanisms. [my original EE
background was VLSI design before I came into the Smalltalk/OO space].

As to GC, there is no real value in a custom CPU [caching infrastructure
is by far the dominant factor]. The Intel architecture for virtual
memory page control via page-faulting works very efficiently for
ephemeral GC services given 4kb page sizes. 

The design of the OS level handlers for page faults is therefore a
bigger factor. On some cpu's families the virtual memory page control
systems do very poorly for a GC architecture, and those could be
improved.

There are some specific things that can be done to improve behavioral
characteristics of some cpu mechanisms with regard to cache
architectures [and their impact on JITs].

Note: The AOS VM used for SmallScript, is written in C++ and supports a
pluggable architecture for providing object header forms as C++
classes/structs and thus unifies the two languages internally within the
VM. 

It's GC mechanism supports multi-threading, arbitrary heaps, ephemeral
GC based on page-faulting, explicit and automatic pinning, and
extensible per/object-type memory policies based on the pluggable object
form's overloading of AOS VM constructors and destructors for a given
type. It is a complex, hybrid collector with virtual spaces, and it does
not use copying [so all memory is always available for allocation].

W.r.t. GC, reading or writing OOPs (slots) does not require special code
for techniques like remembered set tracking and therefore such
pointers/slots can be processed just as they would in C/C++. The
mechanism for handling something similar to remembered sets, for the
ephemeral collector, is provided through the use of virtual memory
services. The jit does make special code-emitting arrangements
[on-the-fly] for read/write barrier delegation on a per-object and
per-class basis to support the AOP and managed object services.

-- Dave S. [www.smallscript.org]

> 
> But, it may be possible to make a much faster (C/slang-based)
> stack-machine interpreter for squeak. I'm researching this.
> 
> Scott
> 
> 
> 
> 





More information about the Squeak-dev mailing list