Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

List overview All Threads
Download

newer

older

MCSmalltalkhubRepository for...

Daily Commit Log

Bert Freudenberg

15 Feb 2016 15 Feb '16

7:18 p.m.

...

On 15.02.2016, at 15:58, commits@source.squeak.org wrote:

Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.

What?!

Time millisecondClockValue is supposed to be continuous. I’ll have to admit I didn’t follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.

- Bert -

Attachments:

smime.p7s (application/pkcs7-signature — 4.1 KB)

Show replies by date

marcel.taeumel

15 Feb 15 Feb

7:17 p.m.

New subject: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

Best, Marcel

-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Bert Freudenberg

8:07 p.m.

...

On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

...

If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

- Bert -

marcel.taeumel

8:01 p.m.

New subject: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

Hi Bert,

yes, we lost the check on Jan 18, 2015, when we added the reusable interCycleDelay instead of creating new Delay instances again and again and again. :-)

Best, Marcel

-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Levente Uzonyi

16 Feb 16 Feb

12:42 a.m.

On Mon, 15 Feb 2016, marcel.taeumel wrote:

...

Hi Bert,

yes, we lost the check on Jan 18, 2015, when we added the reusable interCycleDelay instead of creating new Delay instances again and again and again. :-)

The check is still superfluous assured the clock is monotonic. Changing it to the UTC clock should partially fix that, but one would still be affected by changes of the system time. Since we're talking about subsecond delays here, I would even consider using the previous clock implementation here.

Levente

...

Best, Marcel

-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.

marcel.taeumel

12:28 p.m.

New subject: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

Hi Levente,

if we use wall clock, there is the time zone issue. If we use OS startup time, then there will be a reset after image snapshot and OS reboot. Either way, the check is really important. If there was a place to reset "lastCycleTime" after Squeak restart, it would not be needed. Yet, there is no place to notify existing projects about shutdown/resume. Maybe projects should add them selves to the AutoStart list?

Best, Marcel

-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Levente Uzonyi

19 Feb 19 Feb

9:48 p.m.

Hi Marcel,

Using the wall clock in UTC is free of time zone issues, but the changes of the OS clock could still have a negative impact, so the extra check is necessary. I don't know the answer to your Project-related question. :)

Levente

On Tue, 16 Feb 2016, marcel.taeumel wrote:

...

Hi Levente,

if we use wall clock, there is the time zone issue. If we use OS startup time, then there will be a reset after image snapshot and OS reboot. Either way, the check is really important. If there was a place to reset "lastCycleTime" after Squeak restart, it would not be needed. Yet, there is no place to notify existing projects about shutdown/resume. Maybe projects should add them selves to the AutoStart list?

Best, Marcel

-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Levente Uzonyi

16 Feb 16 Feb

12:39 a.m.

On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...

...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

...
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.

Levente

...

Bert -

Eliot Miranda

12:49 a.m.

Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...

On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

...
Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

If you would have used VM or OS startup time, this would still be

...
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:

...

On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi Ryan,

...

...
...
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:

On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com

...
...
wrote:
Further back Ryan wrote:
Travis found an assertion failure. Unfortunately the assertions fail to

...
...
...
...
include paths with the line numbers.

...
(newUtcMicrosecondClock >= utcMicrosecondClock 124)

It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?

Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.

...
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.

...
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.

I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.

That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

In Dart, there are three uses of time

...

Stopwatch measures durations (BlockClosure timeToRun). It uses the

...

monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic

...

clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

...

Makes sense, at the cost of having two clocks.

...

Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.

Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.

_,,,^..^,,,_ best, Eliot

...

Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.

This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.

...

Levente

...

Bert -

-- _,,,^..^,,,_ best, Eliot

Tim Felgentreff

11:19 a.m.

Hi Eliot,

am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?

Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.

cheers, Tim

On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:

...

Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

...
Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

If you would have used VM or OS startup time, this would still be

...
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:

...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi Ryan,

...
...
...
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:

On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com

...
...
wrote:
Further back Ryan wrote:
Travis found an assertion failure. Unfortunately the assertions fail

...
...
...
...
to include paths with the line numbers.

...
(newUtcMicrosecondClock >= utcMicrosecondClock 124)

It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?

Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.

...
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.

...
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.

I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

In Dart, there are three uses of time

...
Stopwatch measures durations (BlockClosure timeToRun). It uses the

...
monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic

...
clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

...
Makes sense, at the cost of having two clocks.

...
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.

Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.

_,,,^..^,,,_ best, Eliot

...
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.

This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.

...
Levente

...

Bert -

-- _,,,^..^,,,_ best, Eliot

Eliot Miranda

6:13 p.m.

Hi Tim,

On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff timfelgentreff@gmail.com wrote:

...

Hi Eliot,

am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?

First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the same time, within the bounds of clock drift, any where on the globe. So if you are sent an image from Japan nothing changes to times in that image when yu start it locally, except for the inaccuracies between wall time (some atomic clock somewhere) and your machine and the machine on which the image was saved in Japan.

As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving UI times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.

Second, yes, the idea is to have the VM drift time towards wall time. First some back ground.

By wall time I mean the UTC time as provided by some atomic clock, i.e. an extremely accurate absolute time as is accessed by a network time protocol daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g. http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-.... One can expect such hardware to drift relative to wall time by less than a minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided by gettimeofday. I'm going to call this "local time". This clock can run fast or slow and can jump backwards. It does this because the underlying clock drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time (see Ryan's message below). I'm going to call this "monotonic time".

The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local time and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up to the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This seems to me to be against the Smalltalk approach, which is to provide a safe virtual machine which protects the programmer from the vagaries and dangers of real machines. Further I think that providing a more ideal clock is straight-forward. Here's what I propose.

The Cog VMs use a heartbeat to control the rate at which the VM checks for external input, delay expiry, etc. This is typically a 500Hz/2ms heartbeat, but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can jump forwards or backwards because of ntpd activity.

Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it can compute a delta which it computes from the difference between local and monotonic times. On each heartbeat it sums the delta to compute an offset and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the delta on each heartbeat we can keep the VMs clock monotonic and have it drift towards local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.

Let's express this as a Smalltalk class. Units are microseconds (usecs).

Object subclass: VMClock instanceVariableNames: 'offset now'

*initialize* now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime

*run* "*Compute now every 1ms such that now approaches localTime.*" [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue

*computeNowGivenMonotonic:* monotonicTime *andLocal:* localTime "*Compute now such that now approaches localTime.*" | difference putativeNow delta | "*Note now = (monotonicTime + offset) and we want (monotonicTime + offset) = localTime.* * delta is the ammount to change offset by in this tick.*" localTime = monotonicTime ifTrue: ["*the two clocks agree; if an offset is in effect, reduce it by no more than 5 usecs*" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "*localTime is behind; make the offset more negative by no more than 5 usecs*" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "*localTime is ahead; make the offset more positive by no more than 5 usecs*" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now

I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation; the drift is exaggerated:

*run* "*VMClockTester new run*" "*VMClockTester new run last: 20*" "*VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50*" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock - localClock. now - localClock}]]. ^times contents

Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised to -13ms. At the perturbation localTime jumps to 44ms behind monotonic time, and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.

#(10486812 -13188 10500000 10486812 13188 0) #(10586812 -13188 10600000 10586812 13188 0) #(10686812 -13188 10700000 10686812 13188 0) #(10786807 -13193 10800000 10755947 44053 30860) #(10886307 -13693 10900000 10855947 44053 30360) #(10985807 -14193 11000000 10955947 44053 29860) #(11085307 -14693 11100000 11055947 44053 29360) #(11184807 -15193 11200000 11155947 44053 28860) #(11284307 -15693 11300000 11255947 44053 28360) #(11383807 -16193 11400000 11355947 44053 27860) #(11483307 -16693 11500000 11455947 44053 27360) #(11582807 -17193 11600000 11555947 44053 26860) #(11682307 -17693 11700000 11655947 44053 26360) #(11781807 -18193 11800000 11755947 44053 25860) #(11881307 -18693 11900000 11855947 44053 25360) #(11980807 -19193 12000000 11955947 44053 24860) #(12080307 -19693 12100000 12055947 44053 24360) #(12179807 -20193 12200000 12155947 44053 23860) #(12279307 -20693 12300000 12255947 44053 23360) #(12378807 -21193 12400000 12355947 44053 22860) #(12478307 -21693 12500000 12455947 44053 22360) #(12577807 -22193 12600000 12555947 44053 21860) #(12677307 -22693 12700000 12655947 44053 21360) #(12776807 -23193 12800000 12755947 44053 20860) #(12876307 -23693 12900000 12855947 44053 20360) #(12975807 -24193 13000000 12955947 44053 19860) #(13075307 -24693 13100000 13055947 44053 19360) #(13174807 -25193 13200000 13155947 44053 18860) #(13274307 -25693 13300000 13255947 44053 18360) #(13373807 -26193 13400000 13355947 44053 17860) #(13473307 -26693 13500000 13455947 44053 17360) #(13572807 -27193 13600000 13555947 44053 16860) #(13672307 -27693 13700000 13655947 44053 16360) #(13771807 -28193 13800000 13755947 44053 15860) #(13871307 -28693 13900000 13855947 44053 15360) #(13970807 -29193 14000000 13955947 44053 14860) #(14070307 -29693 14100000 14055947 44053 14360) #(14169807 -30193 14200000 14155947 44053 13860) #(14269307 -30693 14300000 14255947 44053 13360)

Here's the last 20 entries. localTime is now 197ms ahead of monotonic time, and the offset reduces from 221ms to 202ms, and the difference between now and localTime reduces from 24ms to 5ms. #(998321391 221391 998100000 998297171 -197171 24220) #(998420391 220391 998200000 998397171 -197171 23220) #(998519391 219391 998300000 998497171 -197171 22220) #(998618391 218391 998400000 998597171 -197171 21220) #(998717391 217391 998500000 998697171 -197171 20220) #(998816391 216391 998600000 998797171 -197171 19220) #(998915391 215391 998700000 998897171 -197171 18220) #(999014391 214391 998800000 998997171 -197171 17220) #(999113391 213391 998900000 999097171 -197171 16220) #(999212391 212391 999000000 999197171 -197171 15220) #(999311391 211391 999100000 999297171 -197171 14220) #(999410391 210391 999200000 999397171 -197171 13220) #(999509391 209391 999300000 999497171 -197171 12220) #(999608391 208391 999400000 999597171 -197171 11220) #(999707391 207391 999500000 999697171 -197171 10220) #(999806391 206391 999600000 999797171 -197171 9220) #(999905391 205391 999700000 999897171 -197171 8220) #(1000004391 204391 999800000 999997171 -197171 7220) #(1000103391 203391 999900000 1000097171 -197171 6220) #(1000202391 202391 1000000000 1000197171 -197171 5220)

Note that a clock that is accurate to within 1% is more than accurate enough for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation introduces similar pauses. So times can be affected by the odd millisecond anyway.

I'm sure the above is some standard piece of control theory but apart from one open university program I'll never forget which began with attempts to make a moving model tank target a moving object and ended with a sparrow on a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please let me know.

Also, this sounds like a lot of logic to have in the VM and a source of

...

possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.

No, that's exactly the point. With the above algorithm in use, (I believe) time is never more than 1% inaccurate, but converges on wall time, and hence the programmer can use one clock for both measurement and determining wall time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time based on a reputable clock.

Review appreciated.

cheers,

...

Tim

On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

...
Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

If you would have used VM or OS startup time, this would still be

...
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:

...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi Ryan,

...
...
...
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:

On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda <eliot.miranda@gmail.com

...
...
...
wrote:
Further back Ryan wrote:
Travis found an assertion failure. Unfortunately the assertions fail

...
...
...
> to include paths with the line numbers. >

> (newUtcMicrosecondClock >= utcMicrosecondClock 124) >

It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?

Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.

...
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.

...
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.

I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

In Dart, there are three uses of time

...
Stopwatch measures durations (BlockClosure timeToRun). It uses the

...
monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic

...
clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

...
Makes sense, at the cost of having two clocks.

...
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.

Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.

_,,,^..^,,,_ best, Eliot

Currently this change also affects performance (down to 8-10% of the

...
previous implementation), because of the creation of multiple LargeIntegers.

This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.

Levente

...
...

Bert -

_,,,^..^,,,_

best, Eliot

_,,,^..^,,,_ best, Eliot

Ben Coman

18 Jul 18 Jul

8:02 p.m.

Hi All,

Did anything further come of this discussion...

On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...

Hi Tim,

On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff timfelgentreff@gmail.com wrote:

...
Hi Eliot,

am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?

First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the same time, within the bounds of clock drift, any where on the globe. So if you are sent an image from Japan nothing changes to times in that image when yu start it locally, except for the inaccuracies between wall time (some atomic clock somewhere) and your machine and the machine on which the image was saved in Japan.

As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving UI times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.

Second, yes, the idea is to have the VM drift time towards wall time. First some back ground.

By wall time I mean the UTC time as provided by some atomic clock, i.e. an extremely accurate absolute time as is accessed by a network time protocol daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g. http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-.... One can expect such hardware to drift relative to wall time by less than a minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided by gettimeofday. I'm going to call this "local time". This clock can run fast or slow and can jump backwards. It does this because the underlying clock drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time (see Ryan's message below). I'm going to call this "monotonic time".

The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local time and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up to the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This seems to me to be against the Smalltalk approach, which is to provide a safe virtual machine which protects the programmer from the vagaries and dangers of real machines.

The flip side of this is it restricts the tools available to a programmer at the Image level, and adds complexity to the VM that Image programmers can't see if ThingsGoWrong(TM). If the Image had direct access to the several clocks provided by clock_gettime(), it would be possible for an Image level programmer to explore and *understand* the difference in them. I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.

Indeed for Delay scheduler, it might be good to choose between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and observe what happens when I set the system time back a couple of days. Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.

...

Further I think that providing a more ideal clock is straight-forward. Here's what I propose.

But why do our own slewing to match local "real" time, when it seems NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].

There seem to be a couple of choices to making clock_gettime() cross platform... a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb b. https://github.com/ThomasHabets/monotonic_clock

[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html

...

The Cog VMs use a heartbeat to control the rate at which the VM checks for external input, delay expiry, etc. This is typically a 500Hz/2ms heartbeat, but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can jump forwards or backwards because of ntpd activity.

Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it can compute a delta which it computes from the difference between local and monotonic times. On each heartbeat it sums the delta to compute an offset and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the delta on each heartbeat we can keep the VMs clock monotonic and have it drift towards local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.

Let's express this as a Smalltalk class. Units are microseconds (usecs).

Object subclass: VMClock instanceVariableNames: 'offset now'

initialize now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime

run "Compute now every 1ms such that now approaches localTime." [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue

computeNowGivenMonotonic: monotonicTime andLocal: localTime "Compute now such that now approaches localTime." | difference putativeNow delta | "Note now = (monotonicTime + offset) and we want (monotonicTime + offset) = localTime. delta is the ammount to change offset by in this tick." localTime = monotonicTime ifTrue: ["the two clocks agree; if an offset is in effect, reduce it by no more than 5 usecs" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "localTime is behind; make the offset more negative by no more than 5 usecs" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "localTime is ahead; make the offset more positive by no more than 5 usecs" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now

I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation; the drift is exaggerated:

run "VMClockTester new run" "VMClockTester new run last: 20" "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock - localClock. now - localClock}]]. ^times contents

Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised to -13ms. At the perturbation localTime jumps to 44ms behind monotonic time, and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.

[snip]

...

Here's the last 20 entries. localTime is now 197ms ahead of monotonic time, and the offset reduces from 221ms to 202ms, and the difference between now and localTime reduces from 24ms to 5ms.

[snip]

...

Note that a clock that is accurate to within 1% is more than accurate enough for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation introduces similar pauses. So times can be affected by the odd millisecond anyway.

I'm sure the above is some standard piece of control theory but apart from one open university program I'll never forget which began with attempts to make a moving model tank target a moving object and ended with a sparrow on a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please let me know.

...
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.

No, that's exactly the point. With the above algorithm in use, (I believe) time is never more than 1% inaccurate, but converges on wall time, and hence the programmer can use one clock for both measurement and determining wall time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time based on a reputable clock.

Review appreciated.

...
cheers, Tim

On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

...
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:

...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Ryan,

On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:

...
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Further back Ryan wrote:
...
>> >> 5) Travis found an assertion failure. Unfortunately the assertions >> fail to include paths with the line numbers. >> >> >> (newUtcMicrosecondClock >= utcMicrosecondClock 124) > > > It's easy to track down. Just grep for the string. You'll find it > in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to > understand it. What OS are you seeing this on?

Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.

One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.

Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.

...

...
...
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

...
In Dart, there are three uses of time

Stopwatch measures durations (BlockClosure timeToRun). It uses the monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

Makes sense, at the cost of having two clocks.

...
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced.

This is because of #save/restoreResumptionTimes on image shutdown/startup.

...

...
...
...
If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.

Yes, since Delays are currently using gettimeofday() they expire when the system clock jumps. But also, with the move of Delay to a microsecond clock and removal of the clock-wrap checks, perhaps the algorithm is more susceptible to jitter or ntp moving the clock backwards. I would need to think more about that, but anyway clock_gettime(MONOTONIC) which slews wall-time seems a much better choice. I'd be interested in doing some work getting this into the VM since I'd like it for Pharo also.

cheers -ben

...

...
...
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.

Chris Cunningham

8:09 p.m.

On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:

...

Hi All,

<snip>

...

I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.

<snip>

That is soooo true.

...

-cbc

Eliot Miranda

19 Jul 19 Jul

12:27 a.m.

On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:

...

Hi All,

Did anything further come of this discussion...

On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Tim,

On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <

timfelgentreff@gmail.com>

...
wrote:

...
Hi Eliot,

am I understanding correctly that (one of) your ideas is to have just

one

...
...
clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from

Japan

...
...
sends me their image (happened to me just last week) and wants me to

check

...
...
something in it? Will the in-image clock run slow for a few hours until

my

...
...
timezone has caught up with Japan?

First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the

same

...
time, within the bounds of clock drift, any where on the globe. So if

you

...
are sent an image from Japan nothing changes to times in that image when

yu

...
start it locally, except for the inaccuracies between wall time (some

atomic

...
clock somewhere) and your machine and the machine on which the image was saved in Japan.

As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving

UI

...
times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.

Second, yes, the idea is to have the VM drift time towards wall time.

First

...
some back ground.

By wall time I mean the UTC time as provided by some atomic clock, i.e.

an

...
extremely accurate absolute time as is accessed by a network time

protocol

...
daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g.

http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-... .

...
One can expect such hardware to drift relative to wall time by less than

a

...
minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided

by

...
gettimeofday. I'm going to call this "local time". This clock can run

fast

...
or slow and can jump backwards. It does this because the underlying

clock

...
drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by

mach_absolute_time

...
(see Ryan's message below). I'm going to call this "monotonic time".

The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local

time

...
and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up

to

...
the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This

seems to

...
me to be against the Smalltalk approach, which is to provide a safe

virtual

...
machine which protects the programmer from the vagaries and dangers of

real

...
machines.

The flip side of this is it restricts the tools available to a programmer at the Image level, and adds complexity to the VM that Image programmers can't see if ThingsGoWrong(TM). If the Image had direct access to the several clocks provided by clock_gettime(), it would be possible for an Image level programmer to explore and *understand* the difference in them. I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.

Indeed for Delay scheduler, it might be good to choose between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and observe what happens when I set the system time back a couple of days. Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.

...
Further I think that providing a more ideal clock is straight-forward. Here's what I propose.

But why do our own slewing to match local "real" time, when it seems NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].

What do we do on platforms that don't have adjtime? The skewing algorithm is simple and adding it to the VM means we're not dependent on adjtime. Isn't that a good thing?

...

There seem to be a couple of choices to making clock_gettime() cross platform... a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb b. https://github.com/ThomasHabets/monotonic_clock

[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html

Sure. It's implementable. Why not add the simple algorithm to our VM and then we're done?

...

The Cog VMs use a heartbeat to control the rate at which the VM checks for

...
external input, delay expiry, etc. This is typically a 500Hz/2ms

heartbeat,

...
but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can

jump

...
forwards or backwards because of ntpd activity.

Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat

it

...
can compute a delta which it computes from the difference between local

and

...
monotonic times. On each heartbeat it sums the delta to compute an

offset

...
and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the

delta on

...
each heartbeat we can keep the VMs clock monotonic and have it drift

towards

...
local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.

Let's express this as a Smalltalk class. Units are microseconds (usecs).

Object subclass: VMClock instanceVariableNames: 'offset now'

initialize now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime

run "Compute now every 1ms such that now approaches localTime." [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue

computeNowGivenMonotonic: monotonicTime andLocal: localTime "Compute now such that now approaches localTime." | difference putativeNow delta | "Note now = (monotonicTime + offset) and we want (monotonicTime +

offset) =

...
localTime. delta is the ammount to change offset by in this tick." localTime = monotonicTime ifTrue: ["the two clocks agree; if an offset is in effect, reduce it by no more

than

...
5 usecs" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "localTime is behind; make the offset more

negative

...
by no more than 5 usecs" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "localTime is ahead; make the offset more

positive by

...
no more than 5 usecs" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now

I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation;

the

...
drift is exaggerated:

run "VMClockTester new run" "VMClockTester new run last: 20" "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock

...
localClock. now - localClock}]]. ^times contents

Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised

to

...
-13ms. At the perturbation localTime jumps to 44ms behind monotonic

time,

...
and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.

[snip]

...
Here's the last 20 entries. localTime is now 197ms ahead of monotonic

time,

...
and the offset reduces from 221ms to 202ms, and the difference between

now

...
and localTime reduces from 24ms to 5ms.

[snip]

...
Note that a clock that is accurate to within 1% is more than accurate

enough

...
for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation

introduces

...
similar pauses. So times can be affected by the odd millisecond anyway.

I'm sure the above is some standard piece of control theory but apart

from

...
one open university program I'll never forget which began with attempts

to

...
make a moving model tank target a moving object and ended with a sparrow

on

...
a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please

let

...
me know.

...
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the

right

...
...
invocations for measuring timespans, and not rely on a naive "let's get

the

...
...
time now and subtract it from the time later to get the difference",

because

...
...
they might end up with a timespan over "slow" or "fast" seconds.

No, that's exactly the point. With the above algorithm in use, (I

believe)

...
time is never more than 1% inaccurate, but converges on wall time, and

hence

...
the programmer can use one clock for both measurement and determining

wall

...
time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time

based

...
on a reputable clock.

Review appreciated.

...
cheers, Tim

On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:

...
Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
> On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de

wrote:

...
...
...
...
...
> > Hi Bert, > > this was just a regression. There has always been this check in the > past for > Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

> If you would have used VM or OS startup time, this would still be > problematic after an overflow. (Hence the comment about

snapshotting).

...
...
...
...
...
> So, > this fix does not directly address the discussion about synching > #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would

we

...
...
...
...
...
change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in

Time

...
...
...
...
class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more

complex

...
...
...
issue. Here's a discussion I'm having with Ryan Macnak, which covers

what

...
...
...
his team did with the Dart VM. Please read, it's interesting.

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com

wrote:

...
...
...
...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <

eliot.miranda@gmail.com>

...
...
...
...
wrote:

...
Hi Ryan,

On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote: > > On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda > eliot.miranda@gmail.com wrote:
Further back Ryan wrote:
>>> >>> 5) Travis found an assertion failure. Unfortunately the assertions >>> fail to include paths with the line numbers. >>> >>> >>> (newUtcMicrosecondClock >= utcMicrosecondClock 124) >> >> >> It's easy to track down. Just grep for the string. You'll find it >> in sqUnixHeartbeat.c. I've seen this from time to time, and have
yet to

...
...
...
...
...
>> understand it. What OS are you seeing this on? > > > Linux. Looking at the comment above this assert, I see Cog is using > the wrong clock. One should not rely on the realtime clock

(gettimeofday) to

...
...
...
...
...
> move steadily forward. It can jump around due to NTP syncs, the

machine

...
...
...
...
...
> sleeping or the user changing the time settings. Programs running

at startup

...
...
...
...
...
> on the Raspberry Pi in particular can see very large jumps because

it has no

...
...
...
...
...
> hardware clock (battery too expensive) so the first NTP sync will

be a very

...
...
...
...
...
> large correction. We fixed this in the Dart VM a few months ago.

Timers need

...
...
...
...
...
> to be scheduled using the monotonic clock (Linux clock_gettime, Mac > mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think

there

...
...
...
...
...
needs to be some clever averaging algorithm that has the property of

always

...
...
...
...
...
advancing the clock but trying to converge on wall time.

One can imagine on every occasion that the VM updates its notion of

the

...
...
...
...
...
time it accesses both clock_gettime and gettimeofday and computes an

offset

...
...
...
...
...
that is some fraction of the delta between the current clock_gettime

and the

...
...
...
...
...
previous clock_gettime multiplied by the difference between the two

clocks.

...
...
...
...
...
So the VM time is always monotonic, but hunts towards wall time as

answered

...
...
...
...
...
by gettimeofday.

Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works.

Or is

...
...
...
...
...
that the approach you've taken in Dart? Or are there standard

algorithms

...
...
...
...
...
out there? I'll take a look.

I'm not seeing why it needs to be close to wall time. The VM needs

make

...
...
...
...
both a wall clock and a monotonic clock available to the image.

...
...
...
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit

more

...
...
...
complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

...
In Dart, there are three uses of time

Stopwatch measures durations (BlockClosure timeToRun). It uses the monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

Makes sense, at the cost of having two clocks.

...
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards.

The logic

...
...
...
...
to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand

Delays

...
...
...
...
spaced apart by a minute, snapshot, move the system time forward a

day, then

...
...
...
...
resume, they remain evenly spaced.

This is because of #save/restoreResumptionTimes on image shutdown/startup.

...
...
...
...
If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which

is

...
...
...
...
what using the monotonic clock would fix.

Yes, since Delays are currently using gettimeofday() they expire when the system clock jumps. But also, with the move of Delay to a microsecond clock and removal of the clock-wrap checks, perhaps the algorithm is more susceptible to jitter or ntp moving the clock backwards. I would need to think more about that, but anyway clock_gettime(MONOTONIC) which slews wall-time seems a much better choice. I'd be interested in doing some work getting this into the VM since I'd like it for Pharo also.

cheers -ben

...
...
...
Yes, but there is another way. Delays can be implemented to function

as

...
...
...
durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they

are

...
...
...
durations, it is not.

-- _,,,^..^,,,_ best, Eliot

Nicolai Hess

16 Feb 16 Feb

11:29 a.m.

2016-02-16 0:49 GMT+01:00 Eliot Miranda eliot.miranda@gmail.com:

...

Hi Levente, Hi Bert, Hi All,

On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:

...
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:

...
Hi Bert,

this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.

Ah, so we lost the check at some point?

If you would have used VM or OS startup time, this would still be

...
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.

I still think it should answer milliseconds since startup. Why would we change that?

Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.

I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.

Ah, does this mean, there is a solution for this bug

11324 https://pharo.fogbugz.com/f/cases/11324/Image-freeze-when-changing-system-time Image freeze when changing system time I am sure there was a mantis bug entry as well, but I can not find it yet.

...

On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:

...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:

Hi Ryan,

...
...
...
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:

On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com

...
...
wrote:
Further back Ryan wrote:
Travis found an assertion failure. Unfortunately the assertions fail

...
...
...
...
to include paths with the line numbers.

...
(newUtcMicrosecondClock >= utcMicrosecondClock 124)

It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?

Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).

Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.

...
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.

...
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.

I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.

In Dart, there are three uses of time

...
Stopwatch measures durations (BlockClosure timeToRun). It uses the

...
monotonic clock.

Timer schedules a future notification (Delay wait). It uses the monotonic

...
clock.

DateTime gets a timestamp (DateAndTime now). It uses the wall clock.

...
Makes sense, at the cost of having two clocks.

...
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.

Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.

_,,,^..^,,,_ best, Eliot

...
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.

This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.

...
Levente

...

Bert -

-- _,,,^..^,,,_ best, Eliot

tim Rowledge

15 Feb 15 Feb

7:41 p.m.

...

On 15-02-2016, at 10:18 AM, Bert Freudenberg bert@freudenbergs.de wrote:

...
On 15.02.2016, at 15:58, commits@source.squeak.org wrote:

Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.

What?!

Time millisecondClockValue is supposed to be continuous. I’ll have to admit I didn’t follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.

The only way I can see that bug occurring is if the primitive is fetching a time value that is ‘post-TZ’. And - not that I’m expert at all in the arcana of Windows - the code in platforms/win32/vm/sqWin32Time.c looks a bit suspicious somehow.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A flash of light, a cloud of dust, and... What was the question?

Louis LaBrunda

7:55 p.m.

In VA Smalltalk "Time millisecondClockValue" answers an integer number of milliseconds from the time the OS was booted. I know this is the case under Windows and I think it is the same under Linux.

It should be continuous and ever increasing. It is a four byte integer and as such will wrap around to 0 after about 49 days. I have read where it may be adjusted for a change in daylight savings time but I don't think that is the case. It also doesn't seem to be effected by a change in the time of day clock.

Lou

On Mon, 15 Feb 2016 10:18:01 -0800, Bert Freudenberg bert@freudenbergs.de wrote:

...

...
On 15.02.2016, at 15:58, commits@source.squeak.org wrote:

Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.

What?!

Time millisecondClockValue is supposed to be continuous. Ill have to admit I didnt follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.

Bert -

-- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon

2860

Age (days ago)

3014

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

16 comments

10 participants

tags (0)

participants (10)

Ben Coman
Bert Freudenberg
Chris Cunningham
Eliot Miranda
Levente Uzonyi
Louis LaBrunda
marcel.taeumel
Nicolai Hess
Tim Felgentreff
tim Rowledge