[Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Mon May 15 12:27:24 UTC 2017


Sorry to resurrect such an old thread but now that we declare
 usqLong deltaMsecs;

the compiler barks (and will eliminate dead code):

../../spurstacksrc/vm/gcc3x-interp.c:67892:19: warning: comparison of
unsigned expression < 0 is always false [-Wtautological-compare]
                        if (deltaMsecs < 0) {

So we might have to come back to this one.

2017-03-13 2:47 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:

>
> Thanks David!
>
> On Sun, Mar 12, 2017 at 3:52 PM, David T. Lewis <lewis at mail.msen.com>
> wrote:
>
>>
>> On Fri, Mar 10, 2017 at 11:49:43PM -0500, David T. Lewis wrote:
>> >
>> > On Fri, Mar 10, 2017 at 10:19:42AM +0100, Nicolas Cellier wrote:
>> > >
>> > > 2017-03-10 9:45 GMT+01:00 Nicolas Cellier <
>> > > nicolas.cellier.aka.nice at gmail.com>:
>> > >
>> > > >
>> > > >
>> > > > 2017-03-10 3:47 GMT+01:00 David T. Lewis <lewis at mail.msen.com>:
>> > > >
>> > > >>
>> > > >> On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
>> > > >> >
>> > > >> > Hi David,
>> > > >> >
>> > > >> > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <
>> lewis at mail.msen.com>
>> > > >> wrote:
>> > > >> >
>> > > >> > >
>> > > >> > > In the VM, the millisecond clock wraps within the 32 bit
>> integer
>> > > >> range:
>> > > >> > >
>> > > >> > >   #define MillisecondClockMask 0x1FFFFFFF
>> > > >> > >
>> > > >> > > In the Cuis image, Delay class>>handleTimerEvent does this:
>> > > >> > >
>> > > >> > >   nextTick := nextTick min: SmallInteger maxVal.
>> > > >> > >
>> > > >> > > On a 64-bit Spur image, SmallInteger maxVal is
>> 16rFFFFFFFFFFFFFFF,
>> > > >> but on
>> > > >> > > a 32-bit image it is 16r3FFFFFFF.
>> > > >> > >
>> > > >> > > Could that be it?
>> > > >> > >
>> > > >> > > I don't really know how to test in Squeak. As you say, Squeak
>> is now
>> > > >> > > using the microsecond clock in #handleTimerEvent. I do not see
>> > > >> anything
>> > > >> > > in primitiveSignalAtMilliseconds that would behave any
>> differently on
>> > > >> > > a 64 bit versus 32 bit image or VM, but I do not know how to
>> test to
>> > > >> > > be sure.
>> > > >> > >
>> > > >> >
>> > > >> > I bet that the following code from primitiveSignalAtMilliseconds
>> ends up
>> > > >> > wrapping when given
>> > > >> >     Delay primSignal: s atMilliseconds: Time
>> primMillisecondClock - 10.
>> > > >> >
>> > > >> > deltaMsecs := msecs - (self ioMSecs bitAnd:
>> MillisecondClockMask).
>> > > >> > deltaMsecs < 0 ifTrue:
>> > > >> > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
>> > > >> > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>> > > >> >
>> > > >> > and I bet you'll find that the VM will wake up in about 6 days
>> and 5
>> > > >> hours
>> > > >> > ;-)
>> > > >> >
>> > > >>
>> > > >> No. The failure is specific to the 64 bit VM. Source code for the
>> > > >> primitive
>> > > >> is the same in either case.
>> > > >>
>> > > >>
>> > > > >
>> > > >> > I suppose we could fix this, but I'm *much* happier to simply
>> not use
>> > > >> > primitiveSignalAtMilliseconds and stay with the simpler and
>> > > >> wrapping-immune
>> > > >> > primitiveSignalAtUTCMicroseconds
>> > > >>
>> > > >> Fair enough, but given that there is a demonstrated bug that
>> affects only
>> > > >> the 64-bit VM, and given that it expresses itself intermittently
>> and in
>> > > >> ways
>> > > >> that affect only someone who is attempting to migrate their
>> existing V3
>> > > >> image
>> > > >> to Spur, then I would say that it makes very good sense to take
>> the time
>> > > >> to fix it if we are able to do so. After all, there may be other
>> people
>> > > >> who
>> > > >> will want to migrate V3 images to Spur, and there is no point in
>> making
>> > > >> the
>> > > >> process needlessly difficult.
>> > > >>
>> > > >> I do not have the solution, but maybe someone else can help. So I
>> am
>> > > >> asking
>> > > >> for help here. Can someone with a working 64-bit build environment
>> please
>> > > >> check and see if what I said in an earlier email might make a
>> difference:
>> > > >>
>> > > >>   I see that ioMSecs() is declared as signed long (32 bits), but
>> it is
>> > > >> used
>> > > >>   in expression with a 64 bit usqInt. So maybe it needs a cast, or
>> maybe
>> > > >>   the variables like msecs and deltaMsecs in
>> primitiveSignalAtMilliseconds
>> > > >>   should be declared as 32 bit long and unsigned long to match the
>> actual
>> > > >> usage.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Dave
>> > > >>
>> > > >>
>> > > >
>> > > > Hi David
>> > > > I just reviewed the generated code for primitiveSignalAtMilliseconds
>> > > > andhere is what i found:
>> > > >
>> > > > The primitive will simply fail if fed with a negative integer,
>> thanks to:
>> > > >
>> > > >         msecs = positive32BitValueOf(msecsObj);
>> > > >
>> > > > msecs is allways positive, since declared as
>> > > >     usqInt msecs;
>> > > >
>> > > > So far, so good.
>> > > >
>> > > > Then
>> > > >                         deltaMsecs = msecs - ((ioMSecs()) &
>> > > > MillisecondClockMask);
>> > > >                         if (deltaMsecs < 0) {
>> > > >                                 deltaMsecs = (deltaMsecs +
>> > > > MillisecondClockMask) + 1;
>> > > >                         }
>> > > >                         GIV(nextWakeupUsecs) =
>> (ioUTCMicroseconds()) +
>> > > > (deltaMsecs * 1000);
>> > > >
>> > > > Due to the 29 bits Mask, the bitAnd: (&) operation will result into
>> a
>> > > > positive int.
>> > > > 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
>> > > >
>> > > > However, when we do the Roll-over thing:
>> > > >                                 deltaMsecs = (deltaMsecs +
>> > > > MillisecondClockMask) + 1;
>> > > > Then we create a giant delay (the 6 days mentionned by Eliot) which
>> is
>> > > > HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
>> > > >
>> > > > The difference between 32 and 64 bits is in the next line:
>> > > >                         GIV(nextWakeupUsecs) =
>> (ioUTCMicroseconds()) +
>> > > > (deltaMsecs * 1000);
>> > > > Because
>> > > >     sqInt deltaMsecs;
>> > > >
>> > > > is 32 bits for a 32bits spur, 64bits for 64bits spur, then
>> (deltaMsecs *
>> > > > 1000) will do two different things:
>> > > > - in 32 bits it will overflow leading to a negative delay and a
>> > > > nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
>> > > > - in 64 bits it won't overflow leading to he 6 days delay
>> > > >
>> > > > Does that explain?
>> > > >
>> > > > if (deltaMsecs < 0) I would either fail the primitive or set the
>> > > > deltaMsecs to zero.
>> > > > What do you think?
>> > > >
>> > > >  Nicolas
>> > > >
>> > > >
>> > >
>> > > Anyway, the RollOver protection sounds strange: if clock was high,
>> and if
>> > > we don't apply the MillisecondClockMask to (clock+delay),
>> > > then (clock+delay) won't overflow (thanks to largeIntegers).
>> Eventually,
>> > > for delay > 2^32-2^29 msecs, the primitive will fail thru
>> > > (positive32BitValueOf).
>> > > That's something like 43 days delay max, more than enough...
>> > >
>> > > Either the code should rather be something like this with roll-over
>> > > protection
>> > >                         deltaMsecs = (msecs - ioMSecs()) &
>> > > MillisecondClockMask;
>> > > or we should leave deltaMsecs as is and reject negative delays
>> > >
>> > > Also note this in 32 bits:
>> > > Since *1000 consumes 10bits, then delays above 21 bits msecs will
>> overflow
>> > > and be signalled immediately.
>> > > Tha's about 2e6 msecs, 2e3 secs. Thus the maximum delay supported by
>> this
>> > > primitive is a bit less than 36 minutes (16r80000000/1000/1000/60).
>> > > So maybe we should also reject large positive "unreasonable" delays,
>> that
>> > > isanthing above a few minutes and document it.
>> >
>> > I think I understand what you are saying now. Let me see if I can
>> restate
>> > the problem:
>> >
>> > Before the change to microsecond clock, the interpreter VM did the
>> > rollover check only in #checkForInterrupts, and the arithmetic did not
>> > require a 64 bit microsecond range. Now we have rollover logic in
>> > #primitiveSignalAtMilliseconds, which leads to two issues, one of which
>> > masks the other on 32 bits.
>> >
>> > Issue #1: If we pass to the primitive a millisecond count that is
>> slightly
>> > in the past (possibly due to clock skew or other timing issues), then
>> the
>> > primitive will test for deltaMsecs less than zero, and will incorrectly
>> > assume that a millisecond clock rollover has happened.
>> >
>> > Issue #2: When adjusting for clock rollover on 32 bits, deltaMsecs is
>> > declared as 32 bit sqInt, which results in overflow when converting to
>> > microseconds. For 64 bit sqInt, there is no overflow. This error masks
>> > the first problem of Issue #1, which makes the primiitive appear to work
>> > correctly on 32 bits, even though the clock rollover logic is wrong.
>> >
>> > I do not think that we can simply fail the primitive, or set
>> deltaMSeconds
>> > to zero, in the case of deltaMSeconds being less than zero. We still
>> need
>> > to be able to schedule delays over a time period when the millisecond
>> > clock rolls over, so that needs to work.
>> >
>> > We do need to fix the arithmetic overflow condition, which probably just
>> > requires declaring deltaMsecs as usqLong. This matches the declaration
>> > for nextWakeupUsecs and should prevent arithmetic overflow. But fixing
>> > this for issue #2 will unmask issue #1.
>> >
>> > For issue #1, the solution probably requires a heuristic. If deltaMsecs
>> > is a big negative value, then the clock has rolled over. If it is a
>> small
>> > negative value (maybe less than a second, just to pick a number), then
>> > it is probably clock skew and the MillisecondClockMask + 1 adjustment
>> > should not be applied.
>> >
>> > Does this sound right?
>> >
>>
>> I now have a working Cog/Spur build on my Ubuntu box, so I was able to
>> try a few things. It turns out that the arithmetic overflow that Nicolas
>> spotted seems to fully account for the problem that Juan originally
>> reported on the 64 but Cuis image. The overflow happened here:
>>
>>   nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>>
>> Declaring deltaMsecs as a 64 bit unsigned to match the declaration of
>> nextWakeupUsecs resolved the image lockup problem that Juan identified.
>> I committed the fix in VMMaker.oscog-dtl.2147.
>>
>> Dave
>>
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170515/c88c23d7/attachment.html>


More information about the Vm-dev mailing list