multithreading support
sqrmax at cvtci.com.ar
sqrmax at cvtci.com.ar
Thu Jan 29 05:10:47 UTC 1998
Hi.
>I am starting prototyping effort of add native multi-threading support
>to Squeak. Any suggestions, hints, advice would be appreciated.
Just a few days ago I started thinking about the same thing. For me, the
main reason behind this is that I have a 2 processor computer. And if I had
access to an n processor machine, and if I was able to run Squeak on it, I would
like to take advantage of parallel processing.
First of all, I targeted the virtual machine for making it able to run on
multiple processors at once. I thought that the virtual machine was some kind
of cpu I already knew, like the i80386++, the 68xxx and so on. So, as I knew
all those, I concluded that I could use the same tricks the guys at Intel and
Motorola came up with to improve performance on their processors, but now on
the virtual machine. I thought it would be nice to implement the following
things:
vm1) To split the virtual machine into synchronized core parts. The first
two parts I thought of were the fetch-execution thing. If one process was
dedicated to look ahead for byte codes, while another one actually executed them,
things would go faster (twice, like). This is exactly what the Intel guys
did on the 80486 and then on all the pentiums. They gave it a different name,
like Branch Prediction Buffer or so. They guess what will a jump instruction
do, and tell the prefetch queue to fill itself as if the jump instruction had
been executed in the predicted way. Here the picture becomes clearer. Because
usually the time consuming instructions are message sends. So, just by
keeping track of the methods that could be executed before the virtual machine
asks for them will improve performance.
vm2) To optimize the instruction set. This is a rather under development
idea. The fetcher could as well translate the byte codes into some other
representation that allowed faster execution (like actual native code, for
instance). I don't know much about byte codes, but from what I've seen on compiled
methods, pushes, pops and returns tend to occur in almost every method. We
could gain extra performance if we made the virtual machine superscalar as well.
To allow for that, we could take a look at the independence of consecutive
byte codes, and if independent, execute them both at the same time through
instruction pipelines (real machine threads). This is done on all Pentiums and
also a lot of other chips. It would be great to find a way of deciding if two
consecutive message sends are independent of each other or not.
vm3) Split the virtual machine responsibility into a kind of virtual
chipset stuff, consisting of a virtual cpu, a virtual bitblter, a virtual port
manager and so on. For instance, we could decouple screen generation from what
the virtual cpu is actually doing. The same goes with sound, communication and
so. Some kind of separate memory image spaces and a kind of bus could be the
thing to do. This was done on the Amiga. For all of you that used one, do you
remember what the XT was with a similarly fast cpu? The difference was in
the custom chips that released the cpu from doing heavy stuff as bitblitting,
sound mixing and screen updating. Especially now, when screen updating can be
so time consuming. To allow this, the Amiga used the 68000 cpu family from
Motorola. Those cpus are very interesting, because besides twin
data-instruction buses, orthogonal instruction set and a lot of registers they allowed
asynchronous port management. That is, the cpu instructs the serial port to send a
byte and continues executing instructions regardless of what the serial port
is now doing. This doesn't happen on all chips.
Hint: this allows building a real Smalltalk virtual machine, putting each
virtual cpu on silicon.
Crazy? 1) How would it be possible that several virtual cpus executed byte
codes at the same time? Through something similar to Visual Smalltalk's SLLs?
Maybe it's quite close to what an SLL is. Unmovable blocks of ram would be
advisable for all this. For instance, for screen generation we could copy the
AGP idea on new Intel motherboards. Their idea was to remove the video memory
from the video card and use some kind of fast/chip ram like what the Amiga
does (and did since who knows when). This is, memory for programs and data,
and memory for screen, sound samples and so on. Considerable speedups could
develop from this, because we would be splitting the tasks of the virtual
machine into virtual cpus that execute specific byte codes designed just to do
their task. This allows for independence of implementation, and also organizes
primitives were they belong. Primitives for video only implemented in the video
generating virtual cpu. Primitives for communications only implemented in
the communication virtual cpu, and so on. This also allows flexible
optimization over the hardware that is running the virtual cpus.
Furthermore, every of those virtual cpus would be able to execute in
parallel with all the others (except perhaps multiple "general purpose" virtual
machines). What about a multiprocessor version of the GC? Execution speed can be
improved if we let n virtual processors replace object pointers
simultaneously on the image, provided that there is more than 1 real cpu working. But I
think this can be avoided. If a virtual cpu checks what objects are linked to
the new objects in the new object pool (this can be done by tracking other
VMs, or by making the VMs tell this special cpu when they make something point
to a new object), when the GC trigger comes a list of just what needs to be
changed could be already prepared.
This is more or less what I had thought about. I just love it when
manufacturers tell you just a few words on how they improved speed and paint it as
complicated as hell, when in fact those lines let you rethink the problem and
arrive to the same conclusions on your own. This happens all the time with
Intel, for instance. It's also quite astonishing to verify that there are not a
lot of those ideas, and that they are not used all at once because if so,
they would not make as much money as they do now...
Andres.
More information about the Squeak-dev
mailing list
|