[Vm-dev] An event driven Squeak VM

Wed Nov 11 17:29:25 UTC 2009

On Tue, Nov 10, 2009 at 9:59 PM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> 2009/11/11 Eliot Miranda <eliot.miranda at gmail.com>:
> >
> >
> >
> > On Tue, Nov 10, 2009 at 6:45 PM, John M McIntosh <
> johnmci at smalltalkconsulting.com> wrote:
> >>
> >> On 2009-11-10, at 6:17 PM, Eliot Miranda wrote:
> >>
> >>> With the threaded Squeak VM I'm working on one can go one better and
> have a number of image-level processes that block in the FFI and a number of
> worker threads in the VM that block on OS semaphores waiting for the VM to
> give them something to do.
> >>
> >> Obviously now you have to give a bit more details on this. Is it like
> the hydra VM? Or entirely different?
> >
> > Orthogonal, in that it might work well with Hydra.  The basic scheme is
> to have a natively multi-threaded VM that is not concurrent.  Multiple
> native threads share the Vm such that there is only one thread running Vm
> code at any one time.  This the VM can make non-blocking calls to the
> outside world but neither the VM nor the image need to be modified to handle
> true concurrency.  This is the same basic architecture as in the Strongtalk
> and V8 VMs and notably in David Simmons' various Smalltalk VMs.
> > The cool thing about the system is David's design.  He's been extremely
> generous in explaining to me his scheme, which is extremely efficient.  I've
> merely implemented this scheme in the context of the Cog VM.  The idea is to
> arrange that a threaded callout is so cheap that any and all callouts can be
> threaded.  This is done by arranging that a callout does not switch to
> another thread, instead the thread merely "disowns" the VM.  It is the job
> of a background heartbeat thread to detect tat a callout is long-runnijng
> and that the VM has effectively blocked.  The heartbeat then activates a new
> thread to run the VM and the new thread attempts to take ownership and will
> run Smalltalk code if it succeeds.
> > On return form a callout a thread must attempt to take ownership of the
> VM, and if it fails, add itself to a queue of threads waiting to take back
> the VM and then wait on an OS semaphore until the thread owning the VM
> decides to give up ownership to it.
> > Every VM thread has a unique index.  The vmOwner variable holds the index
> of the owning thread or 0 if the VM is unowned.  To disown the VM all a
> thread has to do is zero vmOwner, while remembering the value of vmOwner in
> a temporary.  To take ownership a thread must use a low-level lock to gain
> exclusive access to vmOwner, and if vmOwner is zero, set it back to the
> thread's index, and release the lock.  If it finds vmOwner is non-zero it
> releases the lock and enters the wanting ownership queue.
> > In the Cog VM the heartbeat beats at 1KHz, so any call that takes less
> than 0.5ms is likely to complete without the heartbeat detecting that the VM
> is blocked.  So any and all callouts can be threaded.  Quite brilliant.  All
> the work of changing the active process when switching between threads is
> deferred from callout time to when a different thread takes ownership of the
> VM, saving the VM state for the process that surrendered the VM and
> installing its own.
> > The major wrinkle in this is that in David's VM he has a pinning garbage
> collector which arranges that any arguments passed out through the FFI are
> implicitly pinned.  We don't yet have a pinning garbage collector.  I do
> plan to do one.  But in the interim one quick hack, a neat idea of Andreas',
> is to fail calls that attempt to pass objects in new space, allowing only
> old objects to be passed, and to prevent the full garbage collector from
> running while any threaded calls are in progress.
> > Having cheap non-blocking calls allows e.g.
> > - the Hydra inter-VM channels to be implemented in Smalltalk code above
> the threaded FFI
> > - socket calls to be blocking calls in the image
> > - Smalltalk code to call select/poll/WaitForMultipleEvents
> > There are still plenty of sticky issues to do with e.g. identifying
> threads that can do specific functions, such as the UI thread, and issuing
> OpenGL calls from the right thread, etc, etc.  But these are all doable, if
> potentially tricky to get right.  If this kind of code does migrate from the
> VM innards up to the image I think that's a really good thing (tm) but one
> will really have to know what one is doing to get it right.
> > HTH
> > eliot
>
> I used a mutex in Hydra (each interpreter has own mutex), so any
> operation, which requires synchronization should be performed
> only after obtaining the mutex ownership.
> And sure, if crafted carefully, one could release the mutex before
> doing an external call, and "try" get it back again after call
> completed.
> If use mutexes, provided by OS, then you don't need a heartbeat
> process, obviously because you can simply put wait on mutex. So, i
> suppose you introducing the heardbeat to minimize the overhead of
> using synchronization primitives provided by OS, and instead using a
> low-level assembly code.
>
> Just one minor thing - you mentioned the table of threads. What if
> some routine creating a new thread, which get unnoticed by VM, so its
> not registered in the VM 'threads' table,  but then such thread
> attempts to obtain an ownership on interpreter somehow?
>

This can only happen on a callback or other well-defined entry-point.  At
these well-defined entry-points the VM checks whether there is a tag in
thread-local storage (the thread's VM index).  If it is not set the VM
allocates the necessary per-thread storage, assigns an index and allows the
thread to continue.  On return from the entry-point the VM deallocates the
storage, clears the thread-local storage and returns.

> About inter-image communication in Hydra. The main problem that you
> need to pass a buffer between heads, so you need to get a lock on a
> recepient, while still keeping a lock on sender interpreter. But this
> could lead to deadlock, if recepient in own turn attempts to do the
> same.
> So, the solution, unfortunately, is to copy buffer to C heap (using
> malloc().. yeah :( ), and pass an event with pointer to such buffer,
> which then could be handled by recepient as soon as it ready to do so,
> in event handling routine.
>

But you could connect the two with a pair of pipes, right?  Then al that
locking and buffer allocation is in the VM.  Or rather, once you have a
non-blocking FFI you can just use an OS's native stream-based inter-process
communications facilities.

> One more thing:
>   socket calls to be blocking calls in the image
>
> Assuming that VM use blocking sockets, then call will block the thread
> & some of the image-side process.
> Then hearbeat thread at some point sees that VM has no owning thread
> and so, allows another thread, waiting in the queue to take ownership
> on VM.
> But what if there is no such thread? There is a choice: allocate new
> native thread and let it continue running VM, or just ignore &  skip
> over for the next heat beat.
> I'd like to hear what you choose. Because depending from direction
> taken, on server image, which simultaneously serves, say 100
> connections you may end up either with 100 + 1 native threads, or less
> (fixed) number of them but with risk to unable to run any VM code
> until some of the blocking calls completes.
>

 There is a simple policy that is a cap on the total number of threads the
VM will allocate.  below this a new thread is allocated.  At the limit the
VM will block.  But note that the pool starts at 1 and only grows as
necessary up to the cap.

I'd like to note that either of above alternatives having a quite bad
> scalability potential.
> I'd prefer to have a pool of threads, each of them serving N
> connections. The size of threads pool should be 2x-3x number of
> processor cores on host, because making more than that will not make
> any real difference, since single core can serve only single native
> thread while others will just consume the memory resources, like
> address space etc.
>

That's very similar to my numbers too.  My current default is at least two
threads and no more than 32, and 2 x num processors/cores in between.  But
these numbers should be configurable.  This is just to get started.

> >>
> >>
> >> --
> >>
> ===========================================================================
> >> John M. McIntosh <johnmci at smalltalkconsulting.com>   Twitter:
>  squeaker68882
> >> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> >>
> ===========================================================================
> >>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20091111/8eda7197/attachment.htm