[Q] Has SharedQueue a very subtle bug?

Thu Apr 18 09:27:35 UTC 2002

Scott A Crosby <crosby at qwes.math.cmu.edu> wrote:
> On Mon, 15 Apr 2002 goran.hultgren at bluefish.se wrote:
> 
> > Ok, a few conclusions:
> >
> > 2. Scott's addition of #critical: to SharedQueue seems useful in some
> > cases. Should have proper comments to explain how it can be used.
> 
> Actually, it may not be a good idea....

Ok, why then? You proposed it. :-)

> > ...and then we come to #isEmpty and #size.
> 
> You can't invoke #isEmpty or #size and get a reliable result unless the
> lock is held. (I covered this in an earlier mail.)

I would argue even more strongly (well, you already saw that below) -
even using your critical: you must take into account that there can be
multiple readers "queued up" that you will never be able to "beat to
it". So you need to be aware of what you can use the information give by
size/isEmpty for.

Anyway - we are probably completely in agreement.

> So the options are either this idiom:
> 
>     sq critical: [ ..... sq isEmpty ....]
>     sq critical: [ ..... sq size ....]
> 
> Or this idiom
> 
>     sq ifEmpty: [ ........ ]
>     sq withSize: [ :size |  ...... ]
> 
> In either case, it is not permitted to do queue mutation during those

> blocks. (Until/unless squeak gets reentrant locks.)

Well, that would only fix half of the problem I think. You must also
change the current situation where other readers can be queued up. But
sure.

> Personally, I am tempted for the first idiom, because you can run #isEmpty
> or #size outside of the #critical:, as long as you can accept that the
> answer is only advisory and not accurate for any fixed duration. (And if
> they are run within the #critical:, then the answer is known valid for the
> duration of the #critical:

Eh... Then we need two different versions of #size/#isEmpty because if
you want to run them within #critical: they must not use accessProtect
and if they are run outside of #critical: they can give bogus results if
they do NOT use accessProtect. At least as long as we have methods like
#peek is implemented today.

> > Scott A Crosby <crosby at qwes.math.cmu.edu> wrote:
> > [SNIP]
> > > And you can't get an *accurate* size outside the lock on the semaphore.
> > > #peek below just means that we run a risk of an inaccurate and possibly
> > > corrupt size. If #size also got #accessProtect, we'd still have this
> > > inaccuracy...

When you say inaccuract here I assume you are thinking of "queued up
readers" right=

[SNIP]
> > Anyway, if we would protect #size and #isEmpty then those methods could
> > not be used inside Scott's #critical: (since they would block) and they
> > still would not be "safe" when used from the outside without #critical:
> > because there can still be Processes queued up at the readSemaphore or
> > "just released" from it and just about to get into the accessProtect
> > semaphore, or already queued up at the accessProtect semaphore.
> > Translated to english this means that the queue can have readers "queued
> > up" so naive code like:
> >
> > q isEmpty not ifTrue:[q next]
> >
> > ...could block because it might be empty after all - the other readers
> > beat us to it. In short - #size and #isEmpty should be heavily commented
> > so that people don't get fooled by them.
> 
> Exactly!
> 
> Fixes are 1: Allow locks to be held recursively. (Probably
> create a seperate Mutex class).. Then you can do
> 
>   q critical: [q isEmpty not ifTrue:[^q next] ifFalse: [^nil]]
> 
> Where since the lock is already held by *this* thread from within
> #critical: , the access to accessProtect in 'next' also succeeds.

Ok, that is a larger rewrite of SharedQueue. But perhaps we should do
it.

> --
> Or, create a seperate method called #nextNoLock that is only to be
> invoked from within #critical: and assumes that the lock has been grabbed.
> 
>   q critical: [q isEmpty not ifTrue:[^q nextNoLock]]

Oooh. That would (currently) mess things up. It would mean that you
could "sneak by" the reader queue (Processes sitting on the readSema AND
on the accessProtect AND inbetween) and steal objects causing the 
'Error in SharedQueue synchronization' in #next.

But sure - we could change the logic in SharedQueue and not rely on the
signal count and instead use the Semaphore as a "data available" flag as
we did in SharedBufferStream. You would then have to take into account
that when #next finally gets into the critical: part the queue could be
empty and you would have to go back and wait again (at the end of the
line). This is how SharedBufferStream works because we can not match
readers against writers (since it is a stream and not a queue).

> --
> 
> BTW, it is safe to do
> 
>   q size > HighWaterMark   ifFalse: [ q nextPut: 2 ]
> 
> as long as going a *little* over the high water mark isn't bad.. Because
> if there are a max of $k$ concurrent threads, the worst case is all $k$ of
> them run 'q size' and get the size equal to HighWaterMark. Then all k of
> them add one more item.. So in this case, we can bound the used size at:
> 
>   HighWaterMark + k
> 
> Which shoudn't be that severe of a problem..

I agree.

> > Actaully, when I come to think of it #nextOrNil is probably pretty buggy
> > too. Since it doesn't wait on the readSemaphore it just might go
> > straight by one or more other readers and steal the next object from
> > them. This could actually make the queue empty and then we would get a
> > 'Error in SharedQueue synchronization' in #next.
> >
> > Perhaps we should remove #nextOrNil: altogether and rely on #peek
> > instead...
> 
> I never looked at this function, but that sounds bad.

Yep, should be fixed.

> > Hrmph, this class is full of fleas! :-)
> 
> Nope.. IMHO, Its more an artifact that multithreading programming is a
> bitch. :)

Yes, I agree! We have been scratching our head once or twice when
concocting the SharedBufferStream and friends. But now they seem
solid... (famous last words)

> BTW, sorry... Missed this new message on the list.

No prob.

> Scott

regards, Göran