Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)

Igor Stasenko siguctua at gmail.com
Wed Oct 31 14:25:14 UTC 2007


On 31/10/2007, Andreas Raab <andreas.raab at gmx.de> wrote:
> Igor Stasenko wrote:
> > If you have any ideas how such VM would look like i'm glad to hear.
>
> Okay, so Josh convinced me to write up the ideas. The main problem as I
> see it with a *practical* solution to the problem is that all of the
> solutions so far require huge leaps and can't be implemented
> step-by-step (which almost certainly dooms them to failure).
>
> So what do we know and what do we actually all pretty much agree on?
> It's that we need to be able to utilize multiple cores and that we need
> a practical way to get there (if you disagree with the latter this
> message is not meant for you ;-) Running multiple processes is one
> option but it is not always sufficient. For example, some OSes would
> have trouble firing off a couple of thousand processes whereas the same
> OS may have no problem at all with a couple of thousand threads in one
> process. To give an example, starting a thread on Windows cost somewhere
> in the range of a millisecond which is admittedly slow, but still orders
> of magnitude faster than creating a new process. Then there are issues
> with resource sharing (like file handles) which are practically
> guaranteed not to work across process boundaries etc. So while there are
> perfectly good reasons to run multiple processes, there are reasons just
> as good to wanting to run multiple threads in one process.
>
> The question then is, can we find an easy way to extend the Squeak VM to
> run multiple threads and if so how? Given the simplistic nature of the
> Squeak interpreter, there is actually very little global state that is
> not encapsulated in objects on the Squeak heap - basically all the
> variables in class interpreter. So if we would put them into state that
> is local to each thread, we could trivially run multiple instances of
> the byte code interpreter in the same VM. This gets us to the two major
> questions:
>
> * How do we encapsulate the interpreter state?
> * How do we deal with primitives and plugins?
>
> Let's start with the first one. Obviously, the answer is "make it an
> object". The way how I would go about is by modifying the CCodeGenerator
> such that it generates all functions with an argument of type "struct
> VM" and that variable accesses prefix things properly and that all
> functions calls pass the extra argument along. In short, what used to be
> translated as:
>
> sqInt primitiveAdd(void) {
>    integerResult = stackIntegerValue(1) + stackIntegerValue(0)
>    /* etc. */
> }
>
> will then become something like here:
>
> sqInt primitiveAdd(struct VM *vm) {
>    integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0)
>    /* etc. */
> }
>
> This is a *purely* mechanical step that can be done independent of
> anything else. It should be possible to generate code that is entirely
> equivalent to todays code and with a bit of tweaking it should be
> possible to make that code roughly as fast as we have today (not that I
> think it matters but understanding the speed difference between this and
> the default interpreter is important for judging relative speed
> improvements later).
>

There are already some steps done in this direction. A sources for
RISC architecture generate a foo struct , which holds all interpreter
globals.
Also, i did some changes in Exupery to create a single struct of all
VM globals (not only variables, but functions too).
This was done to make it easier to get address of any global symbol
what Exupery needs.
I'm also experimented to replace all direct calls to function to
indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused
about ~1% of speed degradation in tinyBenchmarks :)
Also, moving forward on this renders an InterpreterProxy struct
useless, because we can just pass an address to our 'foo' struct to
plugins which already contains everything what plugin can reach.

> The above takes care about the interpreter but there are still
> primitives and plugins that need to be dealt with. What I would do here
> is define operations like ioLock(struct VM) and ioUnlock(struct VM) that
> are the effective equivalent of Python's GIL (global interpreter lock)
> and allow exclusive access to primitives that have not been converted to
> multi-threading yet. How exactly this conversion should happen is
> deliberately left open here; maybe changing the VMs major proxy version
> is the right thing to do to indicate the changed semantics. In any case,
> the GIL allows us to readily reuse all existing plugins without having
> to worry about conversion early on.
>
Or as i proposed in earlier posts, the other way could be to schedule
all primitive calls, which currently don't support multi-threading to
single 'main' thread.
Then we don't need the GIL.

> So now we've taken care of the two major parts of Squeak: We have the
> ability to run new interpreters and we have the ability to use
> primitives. This is when the fun begins, because at this point we have
> options:
>
> For example, if you are into shared-state concurrency, you might
> implement a primitive that forks a new instance of the interpreter
> running in the same object memory that your previous interpreter is
> running in.
>
> Or, and that would be the path that I would take, implement a primitive
> that loads an image into a new object memory (I can explain in more
> detail how memory allocation needs to work for that; it is a fairly
> straightforward scheme but a little too long for this message) and run
> that interpreter.
>
> And at this point, the *real* fun begins because we can now start to
> define the communication patterns we'd like to use (initially sockets,
> later shared memory or event queues or whatever else). We can have tiny
> worker images that only do minimal stuff but we can also do a Spoon-like
> thing where we have a "master image" that contains all the code possibly
> needed and fire off micro-images that (via imprinting) swap in just the
> code they need to run.
>
> [Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
>
> Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-)
> Anyway, the main thing I'm trying to say in the above is that for a
> *practical* solution to the problem there are some steps that are pretty
> much required whichever way you look at it. And I think that regardless
> of your interest in shared state or message passing concurrency we may
> be able to define a road that leads to interesting experiments without
> sacrificing the practical artifact. A VM built like described in the
> above would be strictly a superset of the current VM so it would be able
> to run any current images and leave room for further experiments.
>
> Cheers,
>    - Andreas
>
>
>


-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list