[Vm-dev] Why isn't signalSemaphoreWithIndex() thread-safe?

Mon Sep 21 00:47:27 UTC 2009

2009/9/21 Eliot Miranda <eliot.miranda at gmail.com>:
>
>
>
> On Sun, Sep 20, 2009 at 3:35 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>>
>> 2009/9/21 Joshua Gargus <schwa at fastmail.us>:
>> >
>> > Igor Stasenko wrote:
>> >>
>> >> 2009/9/21 Joshua Gargus <schwa at fastmail.us>:
>> >>
>> >>> Eliot Miranda wrote:
>> >>>
>> >>>> ------------------------------------------------------------------------
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sun, Sep 20, 2009 at 1:00 PM, Eliot Miranda
>> >>>> <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
>> >>>>
>> >>>>      Instead we can use three variables per index to implement an
>> >>>>     exclusion-free solution, thusly:
>> >>>>
>> >>>>     typedef struct { int lock; // accessed via e.g. XCHG to protect
>> >>>>     signalRequest
>> >>>>                            int requests;
>> >>>>                            int responses;
>> >>>>                } ExternalSignalRequest;
>> >>>>
>> >>>>     ExternalSignalRequest *externalSignalRequests;
>> >>>>     int checkExternalSignalRequests;
>> >>>>
>> >>>>     void
>> >>>>     requestSignal(int i)
>> >>>>     {
>> >>>>            while (!lock(&externalSignalRequests[i].lock))
>> >>>>                  usleep(1);
>> >>>>
>> >>>>            ++externalSignalRequests[i].requests;
>> >>>>
>> >>>>             unlock(&externalSignalRequests[i].lock);
>> >>>>
>> >>>>             checkExternalSignalRequests = 1;
>> >>>>             forceInterruptCheck();  // set a flag to cause the VM to
>> >>>>     check for interrupts; in the stack VM this is stackLimit :=
>> >>>>     (usqInt)-1;
>> >>>>     }
>> >>>>
>> >>>>
>> >>>>     The VM responds in checkEvents:
>> >>>>
>> >>>>         if (checkExternalSignalRequests)
>> >>>>             for (i = 0; i < numExternalSemaphores; i++)
>> >>>>                 while (externalSignalRequests[i]. responses
>> >>>>     != externalSignalRequests[i]. requests) {
>> >>>>                     signalSemaphoreWithIndex(i);
>> >>>>                     externalSignalRequests[i]. responses;
>> >>>>                 }
>> >>>>
>> >>>>
>> >>> Am I correct that ExternalSignalRequest.reponses is never reset to zero,
>> >>> instead eventually wrapping around?  That worried me for a minute, but I
>> >>> see that it doesn't matter.  Neeto.
>> >>>
>> >>>
>> >>
>> >> Nope, it does resets the counters.. see signalExternalSemaphores  method.
>> >>
>> >
>> > I was responding to Eliot's proposal, not the existing code.  If I
>> > understand properly, he's proposing to maintain a per-semaphore count of
>> > signal requests/responses, so there would be no
>> > "semaphoresToSignalCountA/B".
>> >
>> Ah.. sorry for not reading carefully..
>> In proposed scenario i don't like 1 (one) thing, that checkForInterrupts will
>> spend a linear amount of time to check what semaphores need to be signaled
>> instead of spending linear amount of time to signal semaphores which
>> is to be signaled:
>>
>>     for (i = 0; i < numExternalSemaphores; i++)
>>
>> means, that if you having 1000 external semaphores, and only 1 of them
>> signaled currently, you have to
>> loop through all of them, inevitably :(
>
> Yes, that caused a twinge.  Bit with a separate lock we can maintain high and low tides, e.g.
> typedef struct { int lock; // accessed via e.g. XCHG to protect signalRequest
>                        int requests;
>                        int responses;
>            } ExternalSignalRequest;
> ExternalSignalRequest *externalSignalRequests;
> int checkExternalSignalRequests;
> int tideLock = 0;
> int useTideA = 1;
> int lowTideA = (unsigned)-1 >> 1, highTideA = 0;
> int lowTideB = (unsigned)-1 >> 1, highTideB = 0;
> void
> requestSignal(int i)
> {
>        while (!lock(& tideLock))
>              usleep(1);
>        if (useTideA) {
>             if (lowTideA > i) lowTideA = i;
>             if (highTideA < i) highTideA = i);
>        }
>        else {
>             if (lowTideB > i) lowTideB = i;
>             if (highTideB < i) highTideB = i);
>        }
>
>        while (!lock(&externalSignalRequests[i].lock))
>              usleep(1);
>        ++externalSignalRequests[i].requests;
>         unlock(&externalSignalRequests[i].lock);
>         checkExternalSignalRequests = 1;
>         forceInterruptCheck();  // set a flag to cause the VM to check for interrupts; in the stack VM this is stackLimit := (usqInt)-1;
> }
>
> The VM responds in checkEvents:
>     if (checkExternalSignalRequests) {

i think you missed here something like following:
        while (!lock(& tideLock))
              usleep(1);

.. and corresponding unlock() somewhere down below, where its safe.

>       checkExternalSignalRequests = 0; // forgot this in my first email
>       if (useTideA) {
>         useTideA = 0;
>         for (i = lowTideA; i < highTideA; i++)
>             while (externalSignalRequests[i]. responses != externalSignalRequests[i]. requests) {
>                 signalSemaphoreWithIndex(i);
>                 externalSignalRequests[i]. responses;
>             }
>             lowTideA = (unsigned)-1 >> 1, highTideA = 0;
>       }
>       else  {
>         useTideA = 1;
>         for (i = lowTideB; i < highTideB; i++)
>             while (externalSignalRequests[i]. responses != externalSignalRequests[i]. requests) {
>                 signalSemaphoreWithIndex(i);
>                 externalSignalRequests[i]. responses;
>             }
>         lowTideB = (unsigned)-1 >> 1, highTideB = 0;
>       }
>
> and we should make lowTideA = (unsigned)-1 >> 1, highTideA = 0; a macro, to avoid the duplication.
> Most of the time the low and high tides will get set to the same thing because very likely most of the time only one semaphore will get signalled.
>
> Note that I did the moral equivalent of this with the processor scheduler's queue in both VW and in the Cog Squeak VMs and it speeds up wakeHighestPriority a lot.  The idea is to maintain in a variable the highest runnable priority, which is increased whenever a process is added to the runnable queue and is of higher priority than the high tide, and reduced in wakeHighestPriority to that of the first process found.  This avoids wakeHighestPriority testing 50 odd empty lists on the typical pass through runnable processes from 100 down to user priority.

This can be defeated by following code (wild guess)

| proc time |
proc := [ [ Processor yield. ] repeat ] newProcess.
proc priority: Processor highestPriority.
proc resume.
time := [ 100000000 timesRepeat: [ "***" Processor yield. ] ] timeToRun.
proc terminate.
time

resulting run time should be very close for both VMs, where one using
and another not using a new quirk.
I may be wrong , maybe at "****" there should be just some code , like
1+1 , but not Processor yield, to make sure
that given loop get interrupted, but not yields execution by itself :)

Regarding scheduling, did you taken a look of my scheduling
refactorings, which moving scheduling logic to image side,
freeing VM from any knowledge of what Semaphores is?
I think it could be a good move. The only thing, which is concern is
speed. But this could be improved by making some primitives
which reducing the number of sends, needed to handle the scheduling
(most consuming is - adding/removing to list(s)). But not rest of
logic, which should belong to image side.
And in presence of JIT, this will no longer be issue.

>
> We're getting close to the end of a release cycle so I expect we'll release the Stack Cog VM soon thereafter.  Sorry for the wait so far :/
>>
>> > Cheers,
>> > Josh
>> >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>

-- 
Best regards,
Igor Stasenko AKA sig.