[Q] Has SharedQueue a very subtle bug?

Wed Apr 17 15:28:01 UTC 2002

On Mon, 15 Apr 2002 goran.hultgren at bluefish.se wrote:

> Ok, a few conclusions:
>
> 2. Scott's addition of #critical: to SharedQueue seems useful in some
> cases. Should have proper comments to explain how it can be used.

Actually, it may not be a good idea....

> ...and then we come to #isEmpty and #size.

You can't invoke #isEmpty or #size and get a reliable result unless the
lock is held. (I covered this in an earlier mail.)

So the options are either this idiom:

    sq critical: [ ..... sq isEmpty ....]
    sq critical: [ ..... sq size ....]

Or this idiom

    sq ifEmpty: [ ........ ]
    sq withSize: [ :size |  ...... ]

In either case, it is not permitted to do queue mutation during those
blocks. (Until/unless squeak gets reentrant locks.)

Personally, I am tempted for the first idiom, because you can run #isEmpty
or #size outside of the #critical:, as long as you can accept that the
answer is only advisory and not accurate for any fixed duration. (And if
they are run within the #critical:, then the answer is known valid for the
duration of the #critical:

>
> Scott A Crosby <crosby at qwes.math.cmu.edu> wrote:
> [SNIP]
> > And you can't get an *accurate* size outside the lock on the semaphore.
> > #peek below just means that we run a risk of an inaccurate and possibly
> > corrupt size. If #size also got #accessProtect, we'd still have this
> > inaccuracy...
> >
> > For example, what should the value of 'a' be after this runs?
> >
> > ``
> > q Sharedqueue new.
> > a Array new: 2.
> >
> > [[ q nextPut: 1 ; next ] repeatForever] fork.
> > [[ a at: q size put: (a at: q size)+1] repeatForever] fork.
> >
> > "Now run both of them for a minute."
> > ''
>
> This is obviously pseudo code but rewritten as:
>
> | q a p1 p2 |
> q := SharedQueue new.
> a := (Array new: 3) at: 1 put: 0; at: 2 put: 0; yourself
> p1 _ [[ q nextPut: 1; next] repeat] fork.
> p2 _ [[ a at: q size + 1 put: (a at: q size + 1) + 1] repeat] fork.
> (Delay forSeconds: 60) wait.
> p1 terminate. p2 terminate.
> a inspect
>
> a at: 1 should give us "how many times size answered 1" etc.
>
> I am not sure what the code was meant to show, Scott? Could you explain?

There isn't per-se a correct answer that #size should return...

It was a not-entirely-thought-out example.. Nevermind.

> And it does seem "buggy" to me since you call #size twice in each loop
> which could give you different answers... Or was that your point?

Nope.. That was a bug. :)

>
> Anyway, if we would protect #size and #isEmpty then those methods could
> not be used inside Scott's #critical: (since they would block) and they
> still would not be "safe" when used from the outside without #critical:
> because there can still be Processes queued up at the readSemaphore or
> "just released" from it and just about to get into the accessProtect
> semaphore, or already queued up at the accessProtect semaphore.
> Translated to english this means that the queue can have readers "queued
> up" so naive code like:
>
> q isEmpty not ifTrue:[q next]
>
> ...could block because it might be empty after all - the other readers
> beat us to it. In short - #size and #isEmpty should be heavily commented
> so that people don't get fooled by them.

Exactly!

Fixes are 1: Allow locks to be held recursively. (Probably
create a seperate Mutex class).. Then you can do

  q critical: [q isEmpty not ifTrue:[^q next] ifFalse: [^nil]]

Where since the lock is already held by *this* thread from within
#critical: , the access to accessProtect in 'next' also succeeds.

--

Or, create a seperate method called #nextNoLock that is only to be
invoked from within #critical: and assumes that the lock has been grabbed.

  q critcal: [q isEmpty not ifTrue:[^q nextNoLock]]

--

BTW, it is safe to do

  q size > HighWaterMark   ifFalse: [ q nextPut: 2 ]

as long as going a *little* over the high water mark isn't bad.. Because
if there are a max of $k$ concurrent threads, the worst case is all $k$ of
them run 'q size' and get the size equal to HighWaterMark. Then all k of
them add one more item.. So in this case, we can bound the used size at:

  HighWaterMark + k

Which shoudn't be that severe of a problem..

>
> Actaully, when I come to think of it #nextOrNil is probably pretty buggy
> too. Since it doesn't wait on the readSemaphore it just might go
> straight by one or more other readers and steal the next object from
> them. This could actually make the queue empty and then we would get a
> 'Error in SharedQueue synchronization' in #next.
>
> Perhaps we should remove #nextOrNil: altogether and rely on #peek
> instead...

I never looked at this function, but that sounds bad.

>
> Hrmph, this class is full of fleas! :-)

Nope.. IMHO, Its more an artifact that multithreading programming is a
bitch. :)

BTW, sorry... Missed this new message on the list.

Scott

--
No DVD movie will ever enter the public domain, nor will any CD. The last CD
and the last DVD will have moldered away decades before they leave copyright.
This is not encouraging the creation of knowledge in the public domain.