[Vm-dev] Re: ProcessTest>>testSchedulingIsFirstComeFirstServed

Sun Jan 31 10:57:12 UTC 2016

On Sun, Jan 31, 2016 at 6:16 PM, Ben Coman <btc at openinworld.com> wrote:
> My general understanding of process scheduling is that it is FIFO
> within same priority.  Indeed Pharo has a test for this
> ProcessTest>>testSchedulingIsFirstComeFirstServed, which however I
> thought was a bit fragile and I was looking to revise.
>
> But now I'm curious whether same-priority-process FIFO scheduling is
> an expected guarantee?
> On both Pharo 5 build 50560 & Squeak 5 build 15113 the following test fails...
>
>   [  1 to: 100000 do: [  :n |
>       | ranFirst ranSecond |
>       [    [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
> := 1] ] forkAt: 78.
>            [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
> := 2] ] forkAt: 78.
>       ] forkAt: 79.
>      self assert: ranFirst=1.
>      self assert: ranSecond=2.     ]
>   ] fork.
>
> ... but only after 20k iterations, which seems an "almost" guarantee.
> Interestingly, it doesn't fail when run without the outer fork.
> Also if I take an alternate approach with less message calls...
>
>   faults := OrderedCollection new.
>   done := Semaphore new.
>   [ 1 to: 10000000 do: [  :n | | ranFirst ranSecond |
>        [   [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond
> := 1] ] forkAt: 78.
>            [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond
> := 2] ] forkAt: 78.
>        ] forkAt: 79.
>        (ranFirst=1 and: ranSecond=2) ifFalse: [  faults add: { n .
> ranFirst . ranSecond } ].
>      ].
>     done signal.
>   ] fork.
>   done wait.
>   faults inspect.
>
> I only get seven failures in 10 million iterations.
> "an OrderedCollection(
>    #(  258535 2 1)
>    #(1148605 2 1)
>    #(3502820 2 1)
>    #(4010935 2 1)
>    #(4533713 2 1)
>    #(6301878 2 1)
>    #(8497001 2 1))"
>
> So what I'm wondering is if there is a bug to hunt down, or if FIFO
> scheduling is simply not an expected guarantee?
>
> cheers -ben

A while ago in Pharo I added DelayNullScheduler, which stops the
timingPriority(=80) process that schedules delays.  After selecting it
via World > System > Settings > System > Delay Scheduler  I cannot
reproduce any of the faults(?) above.   (Aside: I just found
performance a bit better if DelayNullScheduler>>schedule: first does a
"Processor yield")

With the usual delay scheduler operating, the fault(?) can be bypassed
by signalling the timingSemaphore...

   timingSemaphore := Delay testCaseSupportTimingSemaphore.
   [  1 to: 100000 do: [  :n |
       | ranFirst ranSecond |
       [    [ ranFirst ifNil: [ ranFirst := 1 ]
                             ifNotNil: [ ranSecond := 1] ] forkAt: 78.
            [ ranFirst ifNil: [ ranFirst := 2 ]
                            ifNotNil: [ ranSecond := 2] ] forkAt: 78.
       ] forkAt: 79.
      self assert: ranFirst=1.
      self assert: ranSecond=2.     ]
      timingSemaphore signal.
  ] fork.

The fault(?) can also be bypassed by commenting the call to #intercyclePause:
out of WorldState>>doOneCycleFor:

I'd be glad for any insights.

cheers -ben