[Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

Ben Coman btc at openinworld.com
Sat Mar 11 13:43:13 UTC 2017


As an aside, in the mists of time there was a question
about whether the activedelay should be handled in the VM
as a delay rather than an absolute time.

cheers -ben

On Sat, Mar 11, 2017 at 12:49 PM, David T. Lewis <lewis at mail.msen.com> wrote:
>
> On Fri, Mar 10, 2017 at 10:19:42AM +0100, Nicolas Cellier wrote:
>>
>> 2017-03-10 9:45 GMT+01:00 Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com>:
>>
>> >
>> >
>> > 2017-03-10 3:47 GMT+01:00 David T. Lewis <lewis at mail.msen.com>:
>> >
>> >>
>> >> On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
>> >> >
>> >> > Hi David,
>> >> >
>> >> > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <lewis at mail.msen.com>
>> >> wrote:
>> >> >
>> >> > >
>> >> > > In the VM, the millisecond clock wraps within the 32 bit integer
>> >> range:
>> >> > >
>> >> > >   #define MillisecondClockMask 0x1FFFFFFF
>> >> > >
>> >> > > In the Cuis image, Delay class>>handleTimerEvent does this:
>> >> > >
>> >> > >   nextTick := nextTick min: SmallInteger maxVal.
>> >> > >
>> >> > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF,
>> >> but on
>> >> > > a 32-bit image it is 16r3FFFFFFF.
>> >> > >
>> >> > > Could that be it?
>> >> > >
>> >> > > I don't really know how to test in Squeak. As you say, Squeak is now
>> >> > > using the microsecond clock in #handleTimerEvent. I do not see
>> >> anything
>> >> > > in primitiveSignalAtMilliseconds that would behave any differently on
>> >> > > a 64 bit versus 32 bit image or VM, but I do not know how to test to
>> >> > > be sure.
>> >> > >
>> >> >
>> >> > I bet that the following code from primitiveSignalAtMilliseconds ends up
>> >> > wrapping when given
>> >> >     Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
>> >> >
>> >> > deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
>> >> > deltaMsecs < 0 ifTrue:
>> >> > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
>> >> > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>> >> >
>> >> > and I bet you'll find that the VM will wake up in about 6 days and 5
>> >> hours
>> >> > ;-)
>> >> >
>> >>
>> >> No. The failure is specific to the 64 bit VM. Source code for the
>> >> primitive
>> >> is the same in either case.
>> >>
>> >>
>> > >
>> >> > I suppose we could fix this, but I'm *much* happier to simply not use
>> >> > primitiveSignalAtMilliseconds and stay with the simpler and
>> >> wrapping-immune
>> >> > primitiveSignalAtUTCMicroseconds
>> >>
>> >> Fair enough, but given that there is a demonstrated bug that affects only
>> >> the 64-bit VM, and given that it expresses itself intermittently and in
>> >> ways
>> >> that affect only someone who is attempting to migrate their existing V3
>> >> image
>> >> to Spur, then I would say that it makes very good sense to take the time
>> >> to fix it if we are able to do so. After all, there may be other people
>> >> who
>> >> will want to migrate V3 images to Spur, and there is no point in making
>> >> the
>> >> process needlessly difficult.
>> >>
>> >> I do not have the solution, but maybe someone else can help. So I am
>> >> asking
>> >> for help here. Can someone with a working 64-bit build environment please
>> >> check and see if what I said in an earlier email might make a difference:
>> >>
>> >>   I see that ioMSecs() is declared as signed long (32 bits), but it is
>> >> used
>> >>   in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe
>> >>   the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds
>> >>   should be declared as 32 bit long and unsigned long to match the actual
>> >> usage.
>> >>
>> >> Thanks,
>> >>
>> >> Dave
>> >>
>> >>
>> >
>> > Hi David
>> > I just reviewed the generated code for primitiveSignalAtMilliseconds
>> > andhere is what i found:
>> >
>> > The primitive will simply fail if fed with a negative integer, thanks to:
>> >
>> >         msecs = positive32BitValueOf(msecsObj);
>> >
>> > msecs is allways positive, since declared as
>> >     usqInt msecs;
>> >
>> > So far, so good.
>> >
>> > Then
>> >                         deltaMsecs = msecs - ((ioMSecs()) &
>> > MillisecondClockMask);
>> >                         if (deltaMsecs < 0) {
>> >                                 deltaMsecs = (deltaMsecs +
>> > MillisecondClockMask) + 1;
>> >                         }
>> >                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
>> > (deltaMsecs * 1000);
>> >
>> > Due to the 29 bits Mask, the bitAnd: (&) operation will result into a
>> > positive int.
>> > 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
>> >
>> > However, when we do the Roll-over thing:
>> >                                 deltaMsecs = (deltaMsecs +
>> > MillisecondClockMask) + 1;
>> > Then we create a giant delay (the 6 days mentionned by Eliot) which is
>> > HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
>> >
>> > The difference between 32 and 64 bits is in the next line:
>> >                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
>> > (deltaMsecs * 1000);
>> > Because
>> >     sqInt deltaMsecs;
>> >
>> > is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs *
>> > 1000) will do two different things:
>> > - in 32 bits it will overflow leading to a negative delay and a
>> > nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
>> > - in 64 bits it won't overflow leading to he 6 days delay
>> >
>> > Does that explain?
>> >
>> > if (deltaMsecs < 0) I would either fail the primitive or set the
>> > deltaMsecs to zero.
>> > What do you think?
>> >
>> >  Nicolas
>> >
>> >
>>
>> Anyway, the RollOver protection sounds strange: if clock was high, and if
>> we don't apply the MillisecondClockMask to (clock+delay),
>> then (clock+delay) won't overflow (thanks to largeIntegers). Eventually,
>> for delay > 2^32-2^29 msecs, the primitive will fail thru
>> (positive32BitValueOf).
>> That's something like 43 days delay max, more than enough...
>>
>> Either the code should rather be something like this with roll-over
>> protection
>>                         deltaMsecs = (msecs - ioMSecs()) &
>> MillisecondClockMask;
>> or we should leave deltaMsecs as is and reject negative delays
>>
>> Also note this in 32 bits:
>> Since *1000 consumes 10bits, then delays above 21 bits msecs will overflow
>> and be signalled immediately.
>> Tha's about 2e6 msecs, 2e3 secs. Thus the maximum delay supported by this
>> primitive is a bit less than 36 minutes (16r80000000/1000/1000/60).
>> So maybe we should also reject large positive "unreasonable" delays, that
>> isanthing above a few minutes and document it.
>
> I think I understand what you are saying now. Let me see if I can restate
> the problem:
>
> Before the change to microsecond clock, the interpreter VM did the
> rollover check only in #checkForInterrupts, and the arithmetic did not
> require a 64 bit microsecond range. Now we have rollover logic in
> #primitiveSignalAtMilliseconds, which leads to two issues, one of which
> masks the other on 32 bits.
>
> Issue #1: If we pass to the primitive a millisecond count that is slightly
> in the past (possibly due to clock skew or other timing issues), then the
> primitive will test for deltaMsecs less than zero, and will incorrectly
> assume that a millisecond clock rollover has happened.
>
> Issue #2: When adjusting for clock rollover on 32 bits, deltaMsecs is
> declared as 32 bit sqInt, which results in overflow when converting to
> microseconds. For 64 bit sqInt, there is no overflow. This error masks
> the first problem of Issue #1, which makes the primiitive appear to work
> correctly on 32 bits, even though the clock rollover logic is wrong.
>
> I do not think that we can simply fail the primitive, or set deltaMSeconds
> to zero, in the case of deltaMSeconds being less than zero. We still need
> to be able to schedule delays over a time period when the millisecond
> clock rolls over, so that needs to work.
>
> We do need to fix the arithmetic overflow condition, which probably just
> requires declaring deltaMsecs as usqLong. This matches the declaration
> for nextWakeupUsecs and should prevent arithmetic overflow. But fixing
> this for issue #2 will unmask issue #1.
>
> For issue #1, the solution probably requires a heuristic. If deltaMsecs
> is a big negative value, then the clock has rolled over. If it is a small
> negative value (maybe less than a second, just to pick a number), then
> it is probably clock skew and the MillisecondClockMask + 1 adjustment
> should not be applied.
>
> Does this sound right?
>
> Dave
>


More information about the Vm-dev mailing list