Hi Juan,
I confirm your Cuis test results on Squeak using several image/VM combinations, details below.
On Tue, Mar 07, 2017 at 09:49:53AM -0300, Juan Vuletich wrote:
Hi Dave,
Thanks for answering. Inline.
On 3/6/2017 10:50 PM, David T. Lewis wrote:
In the VM, the millisecond clock wraps within the 32 bit integer range:
#define MillisecondClockMask 0x1FFFFFFF
In the Cuis image, Delay class>>handleTimerEvent does this:
nextTick := nextTick min: SmallInteger maxVal.
On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on a 32-bit image it is 16r3FFFFFFF.
Could that be it?
I wasn't aware of that, and had assumed that millisecond timer would use the whole SmallInteger range. This might introduce a bug, that would only appear at timer rollover, i.e. about 6 days after image startup. I'll fix this. Thanks.
But this is a completely separated issue. The problem we saw, the semaphore never being signaled if deadline is in the past, happens immediately after image startup.
I don't really know how to test in Squeak. As you say, Squeak is now using the microsecond clock in #handleTimerEvent. I do not see anything in primitiveSignalAtMilliseconds that would behave any differently on a 64 bit versus 32 bit image or VM, but I do not know how to test to be sure.
Dave
Well, what follows is a way to test VM behavior. Tested in Cuis, but should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis add (copied from Squeak):
I tried this with 4 different Squeak image/VM combinations:
- Squeak 3.8 with interpreter VM (an older image that uses millisecond clock for Delay)
- Squeak trunk V3 image with interpreter VM (latest version image, but non-Spur, updated via www.squeaksource.com/TrunkUpdateStreamV3)
- Squeak trunk 32 bit Spur
- Squeak trunk 64 bit Spur
!Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'! utcMicrosecondClock "Answer the UTC microseconds since the Smalltalk epoch (January 1st 1901, the start of the 20th century). The value is derived from the Posix epoch with a constant offset corresponding to elapsed microseconds between the two epochs according to RFC 868." <primitive: 240> ^0! !
!Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'! primSignal: aSemaphore atUTCMicroseconds: anInteger "Signal the semaphore when the UTC microsecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil, or if the second argument is not an integer. Essential. See Object documentation whatIsAPrimitive." <primitive: 242> ^self primitiveFailed! !
I tried adding these to my Squeak 3.8 image for the test. It does not work properly because the primitive table was different back then, and the interpreter VM is automatically adjusting for this so not calling primitive 240 (actually it calls the old #primitiveSerialPortWrite rather than #primitiveUTCMicrosecondClock that later replaced it).
Nevertheless, the primSignal:atMilliseconds: works, and there is no problem with a -10 parameter, so these are included marked in the results below.
I also note that I locked up the Squeak 3.8 image a couple of times while running various tests with bad input parameters. It is not reproduceable, but there may be something bad about calling #primSignal:atMilliseconds: in an image that is also using it for the Delay mechanism.
I also locked up a Spur 32 image when calling primSignal:atUTCMicroseconds: so this may be the same problem, it may not be safe to call this when the same method is being used for Delay handling.
Then, in a Workspace, try the following 4 doits:
s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. s wait. 'Ok' print.
s _ Semaphore new. Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10. s wait. 'Ok' print.
s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. s wait. 'Ok' print.
s _ Semaphore new. Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10. s wait. 'Not OK at all' print.
On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish immediately, but the fourth freezes the image. The difference in behavior between Spur32 and Spur64 (on Linux) is indeed there.
Ok. Also tried Squeak (note that instead of #millisecondClockValue in Squeak it is #primMillisecondClock) :
Test results for my four Squeak image/VM combinations are added below.
s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. s wait. 'Ok'.
Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK
s _ Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10. s wait. 'Ok'.
Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK
s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. s wait. 'Ok'.
Squeak 3.8 => primitive failed (but see note above for Squeak 3.8 using different primitive table) Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK
s _ Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. s wait. 'Not OK at all'.
Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => Not OK at all, hangs image
Exactly the same behavior.
Confirmed.
I just took a look at static void primitiveSignalAtMilliseconds(void) in https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/... The only thing I see is that msecs is an usqInt and deltaMsecs is an sqInt. But I'm not good enough at gcc subtleties to say if this matters at all. I mean, it looks as if 'if (deltaMsecs < 0) {' was true on Spur64 and false on Spur32... Or maybe the difference is in the handling of nextWakeupUsecs ...
I see that ioMSecs() is declared as signed long (32 bits), but it is used in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared as 32 bit long and unsigned long to match the actual usage.
Unfortunately I cannot easily recompile to verify (build problems on my Ubuntu for Cog/Spur, sorry), but maybe someone else can take a look at this?
In any case, it looks like deadlines in the past are not supported (as code assumes they are because of rollover...)
I agree this looks like a bug in the 64 bit VMs. But I do not yet see the reason for it.
Dave