[Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Mar 10 17:41:25 UTC 2017


2017-03-10 15:14 GMT+01:00 David T. Lewis <lewis at mail.msen.com>:

>
> On Fri, Mar 10, 2017 at 09:45:07AM +0100, Nicolas Cellier wrote:
> >
> > 2017-03-10 3:47 GMT+01:00 David T. Lewis <lewis at mail.msen.com>:
> >
> > >
> > > On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
> > > >
> > > > Hi David,
> > > >
> > > > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <lewis at mail.msen.com>
> > > wrote:
> > > >
> > > > >
> > > > > In the VM, the millisecond clock wraps within the 32 bit integer
> range:
> > > > >
> > > > >   #define MillisecondClockMask 0x1FFFFFFF
> > > > >
> > > > > In the Cuis image, Delay class>>handleTimerEvent does this:
> > > > >
> > > > >   nextTick := nextTick min: SmallInteger maxVal.
> > > > >
> > > > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF,
> but
> > > on
> > > > > a 32-bit image it is 16r3FFFFFFF.
> > > > >
> > > > > Could that be it?
> > > > >
> > > > > I don't really know how to test in Squeak. As you say, Squeak is
> now
> > > > > using the microsecond clock in #handleTimerEvent. I do not see
> anything
> > > > > in primitiveSignalAtMilliseconds that would behave any differently
> on
> > > > > a 64 bit versus 32 bit image or VM, but I do not know how to test
> to
> > > > > be sure.
> > > > >
> > > >
> > > > I bet that the following code from primitiveSignalAtMilliseconds
> ends up
> > > > wrapping when given
> > > >     Delay primSignal: s atMilliseconds: Time primMillisecondClock -
> 10.
> > > >
> > > > deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
> > > > deltaMsecs < 0 ifTrue:
> > > > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
> > > > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
> > > >
> > > > and I bet you'll find that the VM will wake up in about 6 days and 5
> > > hours
> > > > ;-)
> > > >
> > >
> > > No. The failure is specific to the 64 bit VM. Source code for the
> primitive
> > > is the same in either case.
> > >
> > >
> > >
> > > > I suppose we could fix this, but I'm *much* happier to simply not use
> > > > primitiveSignalAtMilliseconds and stay with the simpler and
> > > wrapping-immune
> > > > primitiveSignalAtUTCMicroseconds
> > >
> > > Fair enough, but given that there is a demonstrated bug that affects
> only
> > > the 64-bit VM, and given that it expresses itself intermittently and in
> > > ways
> > > that affect only someone who is attempting to migrate their existing V3
> > > image
> > > to Spur, then I would say that it makes very good sense to take the
> time
> > > to fix it if we are able to do so. After all, there may be other
> people who
> > > will want to migrate V3 images to Spur, and there is no point in
> making the
> > > process needlessly difficult.
> > >
> > > I do not have the solution, but maybe someone else can help. So I am
> asking
> > > for help here. Can someone with a working 64-bit build environment
> please
> > > check and see if what I said in an earlier email might make a
> difference:
> > >
> > >   I see that ioMSecs() is declared as signed long (32 bits), but it is
> used
> > >   in expression with a 64 bit usqInt. So maybe it needs a cast, or
> maybe
> > >   the variables like msecs and deltaMsecs in
> primitiveSignalAtMilliseconds
> > >   should be declared as 32 bit long and unsigned long to match the
> actual
> > > usage.
> > >
> > > Thanks,
> > >
> > > Dave
> > >
> > >
> >
> > Hi David
> > I just reviewed the generated code for primitiveSignalAtMilliseconds
> > andhere is what i found:
> >
> > The primitive will simply fail if fed with a negative integer, thanks to:
> >
> >         msecs = positive32BitValueOf(msecsObj);
> >
> > msecs is allways positive, since declared as
> >     usqInt msecs;
> >
> > So far, so good.
> >
> > Then
> >                         deltaMsecs = msecs - ((ioMSecs()) &
> > MillisecondClockMask);
> >                         if (deltaMsecs < 0) {
> >                                 deltaMsecs = (deltaMsecs +
> > MillisecondClockMask) + 1;
> >                         }
> >                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
> > (deltaMsecs * 1000);
> >
> > Due to the 29 bits Mask, the bitAnd: (&) operation will result into a
> > positive int.
> > 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
> >
> > However, when we do the Roll-over thing:
> >                                 deltaMsecs = (deltaMsecs +
> > MillisecondClockMask) + 1;
> > Then we create a giant delay (the 6 days mentionned by Eliot) which is
> > HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
> >
> > The difference between 32 and 64 bits is in the next line:
> >                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
> > (deltaMsecs * 1000);
> > Because
> >     sqInt deltaMsecs;
> >
> > is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs *
> > 1000) will do two different things:
> > - in 32 bits it will overflow leading to a negative delay and a
> > nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
> > - in 64 bits it won't overflow leading to he 6 days delay
> >
> > Does that explain?
> >
> > if (deltaMsecs < 0) I would either fail the primitive or set the
> deltaMsecs
> > to zero.
> > What do you think?
>
> Hi Nicolas,
>
> Thanks very much for looking into this and for the explanation. I am away
> right now so I cannot look at it now, but it does seem likely to be a sign
> extension issue at least in part.
>
> If you have a chance to try recompiling with changes to the generated code,
> Juan's test condition is just this:
>
>   s := Semaphore new.
>   Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
>   s wait.
>
> This locks the image on 64-bits, and not on 32-bits.
>
> Thanks a lot!
> Dave
>
>
Hi Dave,
No, it's not a sign extension problem.

The code is broken.
The rollover protection is undue because we inject (clock+delay) and then
subtract next_clock,
In case of clock rollover, next_clock falls to zero and we will have a big
delay, not a negative one.
The rollover protection also transform a small negative delay into big
delay.

Things seems to work in 32 bits just because we have an integer overflow
when we convert msecs to usecs.
big_delay_in_msec * 1000 => overflow => negative_delay_in_usecs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170310/fcd8124b/attachment.html>


More information about the Vm-dev mailing list