[Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

David T. Lewis lewis at mail.msen.com
Wed Mar 8 02:58:02 UTC 2017


Hi Juan,

I confirm your Cuis test results on Squeak using several image/VM combinations,
details below.

On Tue, Mar 07, 2017 at 09:49:53AM -0300, Juan Vuletich wrote:
> Hi Dave,
> 
> Thanks for answering. Inline.
> 
> On 3/6/2017 10:50 PM, David T. Lewis wrote:
> >
> >In the VM, the millisecond clock wraps within the 32 bit integer range:
> >
> >   #define MillisecondClockMask 0x1FFFFFFF
> >
> >In the Cuis image, Delay class>>handleTimerEvent does this:
> >
> >   nextTick := nextTick min: SmallInteger maxVal.
> >
> >On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> >a 32-bit image it is 16r3FFFFFFF.
> >
> >Could that be it?
> 
> I wasn't aware of that, and had assumed that millisecond timer would use 
> the whole SmallInteger range. This might introduce a bug, that would 
> only appear at timer rollover, i.e. about 6 days after image startup. 
> I'll fix this. Thanks.
> 
> But this is a completely separated issue. The problem we saw, the 
> semaphore never being signaled if deadline is in the past, happens 
> immediately after image startup.
> 
> >I don't really know how to test in Squeak. As you say, Squeak is now
> >using the microsecond clock in #handleTimerEvent. I do not see anything
> >in primitiveSignalAtMilliseconds that would behave any differently on
> >a 64 bit versus 32 bit image or VM, but I do not know how to test to
> >be sure.
> >
> >Dave
> 
> Well, what follows is a way to test VM behavior. Tested in Cuis, but 
> should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis 
> add (copied from Squeak):

I tried this with 4 different Squeak image/VM combinations:

- Squeak 3.8 with interpreter VM (an older image that uses millisecond
  clock for Delay)

- Squeak trunk V3 image with interpreter VM (latest version image, but
  non-Spur, updated via www.squeaksource.com/TrunkUpdateStreamV3)

- Squeak trunk 32 bit Spur

- Squeak trunk 64 bit Spur

> 
> !Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'!
> utcMicrosecondClock
>     "Answer the UTC microseconds since the Smalltalk epoch (January 1st 
> 1901, the start of the 20th century).
>      The value is derived from the Posix epoch with a constant offset 
> corresponding to elapsed microseconds
>      between the two epochs according to RFC 868."
> <primitive: 240>
>     ^0! !
> 
> !Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'!
> primSignal: aSemaphore atUTCMicroseconds: anInteger
>     "Signal the semaphore when the UTC microsecond clock reaches the 
> value of the second argument.
>      Fail if the first argument is neither a Semaphore nor nil, or if 
> the second argument is not an integer.
>      Essential. See Object documentation whatIsAPrimitive."
> <primitive: 242>
>     ^self primitiveFailed! !

I tried adding these to my Squeak 3.8 image for the test. It does not
work properly because the primitive table was different back then, and
the interpreter VM is automatically adjusting for this so not calling
primitive 240 (actually it calls the old #primitiveSerialPortWrite rather
than #primitiveUTCMicrosecondClock that later replaced it).

Nevertheless, the primSignal:atMilliseconds: works, and there is no problem
with a -10 parameter, so these are included marked in the results below.

I also note that I locked up the Squeak 3.8 image a couple of times while
running various tests with bad input parameters. It is not reproduceable,
but there may be something bad about calling #primSignal:atMilliseconds:
in an image that is also using it for the Delay mechanism.

I also locked up a Spur 32 image when calling primSignal:atUTCMicroseconds:
so this may be the same problem, it may not be safe to call this when the
same method is being used for Delay handling.

> 
> Then, in a Workspace, try the following 4 doits:
> 
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
> s wait.
> 'Ok' print.
> 
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10.
> s wait.
> 'Ok' print.
> 
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
> s wait.
> 'Ok' print.
> 
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10.
> s wait.
> 'Not OK at all' print.
> 
> On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish 
> immediately, but the fourth freezes the image. The difference in 
> behavior between Spur32 and Spur64 (on Linux) is indeed there.
> 
> 
> Ok. Also tried Squeak (note that instead of #millisecondClockValue in 
> Squeak it is #primMillisecondClock) :

Test results for my four Squeak image/VM combinations are added below.


> 
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
> s wait.
> 'Ok'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

> 
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10.
> s wait.
> 'Ok'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

> 
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
> s wait.
> 'Ok'.

Squeak 3.8 => primitive failed (but see note above for Squeak 3.8 using different primitive table)
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

> 
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
> s wait.
> 'Not OK at all'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => Not OK at all, hangs image

> 
> Exactly the same behavior.

Confirmed.

> 
> I just took a look at
> static void primitiveSignalAtMilliseconds(void)
> in 
> https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c
> The only thing I see is that msecs is an usqInt and deltaMsecs is an 
> sqInt. But I'm not good enough at gcc subtleties to say if this matters 
> at all. I mean, it looks as if  'if (deltaMsecs < 0) {' was true on 
> Spur64 and false on Spur32... Or maybe the difference is in the handling 
> of nextWakeupUsecs ...

I see that ioMSecs() is declared as signed long (32 bits), but it is used in
expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables
like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared
as 32 bit long and unsigned long to match the actual usage.

Unfortunately I cannot easily recompile to verify (build problems on my Ubuntu
for Cog/Spur, sorry), but maybe someone else can take a look at this?

> 
> In any case, it looks like deadlines in the past are not supported (as 
> code assumes they are because of rollover...)

I agree this looks like a bug in the 64 bit VMs. But I do not yet see the
reason for it.

Dave



More information about the Vm-dev mailing list