Hi Ben,
On Thu, Jan 7, 2016 at 4:40 PM, Ben Coman btc@openinworld.com wrote:
On Fri, Jan 8, 2016 at 2:51 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ben,
On Thu, Jan 7, 2016 at 10:39 AM, Ben Coman btc@openinworld.com wrote:
On Fri, Jan 8, 2016 at 1:20 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
and here's a version with a better class comment
On Thu, Jan 7, 2016 at 9:12 AM, Eliot Miranda <
eliot.miranda@gmail.com>
wrote:
Hi Denis, Hi Clément, Hi Frank,
On Thu, Jan 7, 2016 at 5:34 AM, Clément Bera <bera.clement@gmail.com
wrote:
Hello,
Eliot, please, you told me you had the code and Denis is interested.
It uses 3 primitives for performance.
Forgive the delay. I thought it proper to ask permission since the code was written while I was at Qwaq. I'm attaching the code in a
fairly raw
state, see the attached. The code is MIT, but copyright 3DICC.
It is a plugin replacement for Squeak's Mutex, and with a little ingenuity could be a replacement for Squeak's Monitor. It is quicker because it uses three new primitives to manage entering a critical
section
and setting the owner, exiting the critical section and releasing
the owner,
and testing if a critical section, entering if the section is
unowned. The
use of the primitives means fewer block activations and ensure:
blocks in
entering and exiting the critical section, and that's the actual
cause of
the speed-up.
You can benchmark the code as is. Here are some results on 32-bit Spur, on my 2.2GHz Core i7
{Mutex new. Monitor new. CriticalSection new} collect: [:cs| | n | n := 0. [cs critical: [n := n + 1]. cs critical: [n := n + 1]. cs critical:
[n
:= n + 1]. cs critical: [n := n + 1]. cs critical: [n := n + 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. n ] bench]
{Mutex new. Monitor new. CriticalSection new} collect: [:cs| | n | n := 0. cs class name, ' -> ', [cs critical: [n := n + 1]. cs critical: [n := n + 1]. cs critical:
[n
:= n + 1]. cs critical: [n := n + 1]. cs critical: [n := n + 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. cs critical: [n := n - 1]. n ] bench]
#( 'Mutex -> 440,000 per second. 2.27 microseconds per run.' 'Monitor -> 688,000 per second. 1.45 microseconds per run.' 'CriticalSection -> 1,110,000 per second. 900 nanoseconds per run.')
This is great Eliot. Thank you and 3DICC. After loading the changeset into Pharo-50515 (32 bit Spur) I get the following results on my laptop i5-2520M @ 2.50GHz
#('Mutex -> 254,047 per second' 'Monitor -> 450,442 per second' 'CriticalSection -> 683,393 per second')
In a fresh Image "Mutex allInstances basicInspect" lists just two mutexes...
- NetNameResolver-->ResolverMutex
- ThreadSafeTranscript-->accessSemaphore
I hate myself for getting distracted but I'm finding this is un. One can migrate to the new representation using normal Monticello loads by
In the first version redefine Mutex and Monitor to subclass LinkedList
and
have their owner/ownerProcess inst var first (actually third after
firstLink
& lastLink), and add the primitives.
In the next version check that all Mutex and Monitor instanes are unowned and then redefine to discard excess inst vars
Let me test this before committing, and see that all tests are ok.
Should Mutex and Monitor both directly subclass LinkedList and duplicate the primitives in each?
Or should they both subclass CriticalSection which subclasses LinkedList so the primitives are only defined once?
That's a good idea. Feel free to change the code, but test that the Monticello load handles this case properly first :-). Actually, given that the default state of all the Mutex and Monitor instances in the image is unowned (owner process is nil) then it'll just work anyway. If we do that, we must make sure to include the ICC copyright in CriticalSection's class comment, and can eliminate it from the primitives.
What effect would using the primitives from the superclass have on
performance? If any, I'd vote to optimise for duplication rather than "nice" design, but our comments should document this.
Likely in the noise. The inline cacheing machinery in the VM is far cheaper than the real overheads here which are in block creation, process switch, interpreter primitive invocation.
cheers -ben