On 15.02.2016, at 15:58, commits@source.squeak.org wrote:
Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.
What?!
Time millisecondClockValue is supposed to be continuous. I’ll have to admit I didn’t follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.
- Bert -
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
Best, Marcel
-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
- Bert -
Hi Bert,
yes, we lost the check on Jan 18, 2015, when we added the reusable interCycleDelay instead of creating new Delay instances again and again and again. :-)
Best, Marcel
-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.
On Mon, 15 Feb 2016, marcel.taeumel wrote:
Hi Bert,
yes, we lost the check on Jan 18, 2015, when we added the reusable interCycleDelay instead of creating new Delay instances again and again and again. :-)
The check is still superfluous assured the clock is monotonic. Changing it to the UTC clock should partially fix that, but one would still be affected by changes of the system time. Since we're talking about subsecond delays here, I would even consider using the previous clock implementation here.
Levente
Best, Marcel
-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.
Hi Levente,
if we use wall clock, there is the time zone issue. If we use OS startup time, then there will be a reset after image snapshot and OS reboot. Either way, the check is really important. If there was a place to reset "lastCycleTime" after Squeak restart, it would not be needed. Yet, there is no place to notify existing projects about shutdown/resume. Maybe projects should add them selves to the AutoStart list?
Best, Marcel
-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.
Hi Marcel,
Using the wall clock in UTC is free of time zone issues, but the changes of the OS clock could still have a negative impact, so the extra check is necessary. I don't know the answer to your Project-related question. :)
Levente
On Tue, 16 Feb 2016, marcel.taeumel wrote:
Hi Levente,
if we use wall clock, there is the time zone issue. If we use OS startup time, then there will be a reset after image snapshot and OS reboot. Either way, the check is really important. If there was a place to reset "lastCycleTime" after Squeak restart, it would not be needed. Yet, there is no place to notify existing projects about shutdown/resume. Maybe projects should add them selves to the AutoStart list?
Best, Marcel
-- View this message in context: http://forum.world.st/Time-millisecondClockValue-was-The-Trunk-Morphic-mt-10... Sent from the Squeak - Dev mailing list archive at Nabble.com.
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.
Levente
- Bert -
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com
wrote:
Further back Ryan wrote:
- Travis found an assertion failure. Unfortunately the assertions fail to
include paths with the line numbers.
(newUtcMicrosecondClock >= utcMicrosecondClock 124)
It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?
Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the
monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic
clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.
_,,,^..^,,,_ best, Eliot
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.
This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.
Levente
- Bert -
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.
cheers, Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com
wrote:
Further back Ryan wrote:
- Travis found an assertion failure. Unfortunately the assertions fail
to include paths with the line numbers.
(newUtcMicrosecondClock >= utcMicrosecondClock 124)
It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?
Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the
monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic
clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.
_,,,^..^,,,_ best, Eliot
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.
This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.
Levente
- Bert -
-- _,,,^..^,,,_ best, Eliot
Hi Tim,
On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff timfelgentreff@gmail.com wrote:
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?
First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the same time, within the bounds of clock drift, any where on the globe. So if you are sent an image from Japan nothing changes to times in that image when yu start it locally, except for the inaccuracies between wall time (some atomic clock somewhere) and your machine and the machine on which the image was saved in Japan.
As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving UI times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.
Second, yes, the idea is to have the VM drift time towards wall time. First some back ground.
By wall time I mean the UTC time as provided by some atomic clock, i.e. an extremely accurate absolute time as is accessed by a network time protocol daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g. http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-.... One can expect such hardware to drift relative to wall time by less than a minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided by gettimeofday. I'm going to call this "local time". This clock can run fast or slow and can jump backwards. It does this because the underlying clock drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time (see Ryan's message below). I'm going to call this "monotonic time".
The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local time and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up to the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This seems to me to be against the Smalltalk approach, which is to provide a safe virtual machine which protects the programmer from the vagaries and dangers of real machines. Further I think that providing a more ideal clock is straight-forward. Here's what I propose.
The Cog VMs use a heartbeat to control the rate at which the VM checks for external input, delay expiry, etc. This is typically a 500Hz/2ms heartbeat, but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can jump forwards or backwards because of ntpd activity.
Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it can compute a delta which it computes from the difference between local and monotonic times. On each heartbeat it sums the delta to compute an offset and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the delta on each heartbeat we can keep the VMs clock monotonic and have it drift towards local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.
Let's express this as a Smalltalk class. Units are microseconds (usecs).
Object subclass: VMClock instanceVariableNames: 'offset now'
*initialize* now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime
*run* "*Compute now every 1ms such that now approaches localTime.*" [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue
*computeNowGivenMonotonic:* monotonicTime *andLocal:* localTime "*Compute now such that now approaches localTime.*" | difference putativeNow delta | "*Note now = (monotonicTime + offset) and we want (monotonicTime + offset) = localTime.* * delta is the ammount to change offset by in this tick.*" localTime = monotonicTime ifTrue: ["*the two clocks agree; if an offset is in effect, reduce it by no more than 5 usecs*" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "*localTime is behind; make the offset more negative by no more than 5 usecs*" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "*localTime is ahead; make the offset more positive by no more than 5 usecs*" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now
I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation; the drift is exaggerated:
*run* "*VMClockTester new run*" "*VMClockTester new run last: 20*" "*VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50*" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock - localClock. now - localClock}]]. ^times contents
Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised to -13ms. At the perturbation localTime jumps to 44ms behind monotonic time, and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.
#(10486812 -13188 10500000 10486812 13188 0) #(10586812 -13188 10600000 10586812 13188 0) #(10686812 -13188 10700000 10686812 13188 0) #(10786807 -13193 10800000 10755947 44053 30860) #(10886307 -13693 10900000 10855947 44053 30360) #(10985807 -14193 11000000 10955947 44053 29860) #(11085307 -14693 11100000 11055947 44053 29360) #(11184807 -15193 11200000 11155947 44053 28860) #(11284307 -15693 11300000 11255947 44053 28360) #(11383807 -16193 11400000 11355947 44053 27860) #(11483307 -16693 11500000 11455947 44053 27360) #(11582807 -17193 11600000 11555947 44053 26860) #(11682307 -17693 11700000 11655947 44053 26360) #(11781807 -18193 11800000 11755947 44053 25860) #(11881307 -18693 11900000 11855947 44053 25360) #(11980807 -19193 12000000 11955947 44053 24860) #(12080307 -19693 12100000 12055947 44053 24360) #(12179807 -20193 12200000 12155947 44053 23860) #(12279307 -20693 12300000 12255947 44053 23360) #(12378807 -21193 12400000 12355947 44053 22860) #(12478307 -21693 12500000 12455947 44053 22360) #(12577807 -22193 12600000 12555947 44053 21860) #(12677307 -22693 12700000 12655947 44053 21360) #(12776807 -23193 12800000 12755947 44053 20860) #(12876307 -23693 12900000 12855947 44053 20360) #(12975807 -24193 13000000 12955947 44053 19860) #(13075307 -24693 13100000 13055947 44053 19360) #(13174807 -25193 13200000 13155947 44053 18860) #(13274307 -25693 13300000 13255947 44053 18360) #(13373807 -26193 13400000 13355947 44053 17860) #(13473307 -26693 13500000 13455947 44053 17360) #(13572807 -27193 13600000 13555947 44053 16860) #(13672307 -27693 13700000 13655947 44053 16360) #(13771807 -28193 13800000 13755947 44053 15860) #(13871307 -28693 13900000 13855947 44053 15360) #(13970807 -29193 14000000 13955947 44053 14860) #(14070307 -29693 14100000 14055947 44053 14360) #(14169807 -30193 14200000 14155947 44053 13860) #(14269307 -30693 14300000 14255947 44053 13360)
Here's the last 20 entries. localTime is now 197ms ahead of monotonic time, and the offset reduces from 221ms to 202ms, and the difference between now and localTime reduces from 24ms to 5ms. #(998321391 221391 998100000 998297171 -197171 24220) #(998420391 220391 998200000 998397171 -197171 23220) #(998519391 219391 998300000 998497171 -197171 22220) #(998618391 218391 998400000 998597171 -197171 21220) #(998717391 217391 998500000 998697171 -197171 20220) #(998816391 216391 998600000 998797171 -197171 19220) #(998915391 215391 998700000 998897171 -197171 18220) #(999014391 214391 998800000 998997171 -197171 17220) #(999113391 213391 998900000 999097171 -197171 16220) #(999212391 212391 999000000 999197171 -197171 15220) #(999311391 211391 999100000 999297171 -197171 14220) #(999410391 210391 999200000 999397171 -197171 13220) #(999509391 209391 999300000 999497171 -197171 12220) #(999608391 208391 999400000 999597171 -197171 11220) #(999707391 207391 999500000 999697171 -197171 10220) #(999806391 206391 999600000 999797171 -197171 9220) #(999905391 205391 999700000 999897171 -197171 8220) #(1000004391 204391 999800000 999997171 -197171 7220) #(1000103391 203391 999900000 1000097171 -197171 6220) #(1000202391 202391 1000000000 1000197171 -197171 5220)
Note that a clock that is accurate to within 1% is more than accurate enough for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation introduces similar pauses. So times can be affected by the odd millisecond anyway.
I'm sure the above is some standard piece of control theory but apart from one open university program I'll never forget which began with attempts to make a moving model tank target a moving object and ended with a sparrow on a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please let me know.
Also, this sounds like a lot of logic to have in the VM and a source of
possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.
No, that's exactly the point. With the above algorithm in use, (I believe) time is never more than 1% inaccurate, but converges on wall time, and hence the programmer can use one clock for both measurement and determining wall time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time based on a reputable clock.
Review appreciated.
cheers,
Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda <eliot.miranda@gmail.com
wrote:
Further back Ryan wrote:
- Travis found an assertion failure. Unfortunately the assertions fail
> to include paths with the line numbers. >
> (newUtcMicrosecondClock >= utcMicrosecondClock 124) >
It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?
Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the
monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic
clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.
_,,,^..^,,,_ best, Eliot
Currently this change also affects performance (down to 8-10% of the
previous implementation), because of the creation of multiple LargeIntegers.
This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.
Levente
- Bert -
_,,,^..^,,,_
best, Eliot
_,,,^..^,,,_ best, Eliot
Hi All,
Did anything further come of this discussion...
On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Tim,
On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff timfelgentreff@gmail.com wrote:
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just one clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from Japan sends me their image (happened to me just last week) and wants me to check something in it? Will the in-image clock run slow for a few hours until my timezone has caught up with Japan?
First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the same time, within the bounds of clock drift, any where on the globe. So if you are sent an image from Japan nothing changes to times in that image when yu start it locally, except for the inaccuracies between wall time (some atomic clock somewhere) and your machine and the machine on which the image was saved in Japan.
As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving UI times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.
Second, yes, the idea is to have the VM drift time towards wall time. First some back ground.
By wall time I mean the UTC time as provided by some atomic clock, i.e. an extremely accurate absolute time as is accessed by a network time protocol daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g. http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-.... One can expect such hardware to drift relative to wall time by less than a minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided by gettimeofday. I'm going to call this "local time". This clock can run fast or slow and can jump backwards. It does this because the underlying clock drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time (see Ryan's message below). I'm going to call this "monotonic time".
The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local time and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up to the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This seems to me to be against the Smalltalk approach, which is to provide a safe virtual machine which protects the programmer from the vagaries and dangers of real machines.
The flip side of this is it restricts the tools available to a programmer at the Image level, and adds complexity to the VM that Image programmers can't see if ThingsGoWrong(TM). If the Image had direct access to the several clocks provided by clock_gettime(), it would be possible for an Image level programmer to explore and *understand* the difference in them. I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.
Indeed for Delay scheduler, it might be good to choose between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and observe what happens when I set the system time back a couple of days. Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.
Further I think that providing a more ideal clock is straight-forward. Here's what I propose.
But why do our own slewing to match local "real" time, when it seems NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].
There seem to be a couple of choices to making clock_gettime() cross platform... a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb b. https://github.com/ThomasHabets/monotonic_clock
[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html
The Cog VMs use a heartbeat to control the rate at which the VM checks for external input, delay expiry, etc. This is typically a 500Hz/2ms heartbeat, but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can jump forwards or backwards because of ntpd activity.
Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it can compute a delta which it computes from the difference between local and monotonic times. On each heartbeat it sums the delta to compute an offset and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the delta on each heartbeat we can keep the VMs clock monotonic and have it drift towards local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.
Let's express this as a Smalltalk class. Units are microseconds (usecs).
Object subclass: VMClock instanceVariableNames: 'offset now'
initialize now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime
run "Compute now every 1ms such that now approaches localTime." [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue
computeNowGivenMonotonic: monotonicTime andLocal: localTime "Compute now such that now approaches localTime." | difference putativeNow delta | "Note now = (monotonicTime + offset) and we want (monotonicTime + offset) = localTime. delta is the ammount to change offset by in this tick." localTime = monotonicTime ifTrue: ["the two clocks agree; if an offset is in effect, reduce it by no more than 5 usecs" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "localTime is behind; make the offset more negative by no more than 5 usecs" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "localTime is ahead; make the offset more positive by no more than 5 usecs" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now
I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation; the drift is exaggerated:
run "VMClockTester new run" "VMClockTester new run last: 20" "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock - localClock. now - localClock}]]. ^times contents
Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised to -13ms. At the perturbation localTime jumps to 44ms behind monotonic time, and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.
[snip]
Here's the last 20 entries. localTime is now 197ms ahead of monotonic time, and the offset reduces from 221ms to 202ms, and the difference between now and localTime reduces from 24ms to 5ms.
[snip]
Note that a clock that is accurate to within 1% is more than accurate enough for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation introduces similar pauses. So times can be affected by the odd millisecond anyway.
I'm sure the above is some standard piece of control theory but apart from one open university program I'll never forget which began with attempts to make a moving model tank target a moving object and ended with a sparrow on a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please let me know.
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the right invocations for measuring timespans, and not rely on a naive "let's get the time now and subtract it from the time later to get the difference", because they might end up with a timespan over "slow" or "fast" seconds.
No, that's exactly the point. With the above algorithm in use, (I believe) time is never more than 1% inaccurate, but converges on wall time, and hence the programmer can use one clock for both measurement and determining wall time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time based on a reputable clock.
Review appreciated.
cheers, Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Further back Ryan wrote:
>> >> 5) Travis found an assertion failure. Unfortunately the assertions >> fail to include paths with the line numbers. >> >> >> (newUtcMicrosecondClock >= utcMicrosecondClock 124) > > > It's easy to track down. Just grep for the string. You'll find it > in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to > understand it. What OS are you seeing this on?
Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced.
This is because of #save/restoreResumptionTimes on image shutdown/startup.
If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.
Yes, since Delays are currently using gettimeofday() they expire when the system clock jumps. But also, with the move of Delay to a microsecond clock and removal of the clock-wrap checks, perhaps the algorithm is more susceptible to jitter or ntp moving the clock backwards. I would need to think more about that, but anyway clock_gettime(MONOTONIC) which slews wall-time seems a much better choice. I'd be interested in doing some work getting this into the VM since I'd like it for Pharo also.
cheers -ben
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.
On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:
Hi All,
<snip>
I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.
<snip>
That is soooo true.
-cbc
On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:
Hi All,
Did anything further come of this discussion...
On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Tim,
On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <
timfelgentreff@gmail.com>
wrote:
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just
one
clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from
Japan
sends me their image (happened to me just last week) and wants me to
check
something in it? Will the in-image clock run slow for a few hours until
my
timezone has caught up with Japan?
First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the
same
time, within the bounds of clock drift, any where on the globe. So if
you
are sent an image from Japan nothing changes to times in that image when
yu
start it locally, except for the inaccuracies between wall time (some
atomic
clock somewhere) and your machine and the machine on which the image was saved in Japan.
As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving
UI
times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.
Second, yes, the idea is to have the VM drift time towards wall time.
First
some back ground.
By wall time I mean the UTC time as provided by some atomic clock, i.e.
an
extremely accurate absolute time as is accessed by a network time
protocol
daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g.
http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-... .
One can expect such hardware to drift relative to wall time by less than
a
minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided
by
gettimeofday. I'm going to call this "local time". This clock can run
fast
or slow and can jump backwards. It does this because the underlying
clock
drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by
mach_absolute_time
(see Ryan's message below). I'm going to call this "monotonic time".
The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local
time
and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up
to
the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This
seems to
me to be against the Smalltalk approach, which is to provide a safe
virtual
machine which protects the programmer from the vagaries and dangers of
real
machines.
The flip side of this is it restricts the tools available to a programmer at the Image level, and adds complexity to the VM that Image programmers can't see if ThingsGoWrong(TM). If the Image had direct access to the several clocks provided by clock_gettime(), it would be possible for an Image level programmer to explore and *understand* the difference in them. I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.
Indeed for Delay scheduler, it might be good to choose between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and observe what happens when I set the system time back a couple of days. Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.
Further I think that providing a more ideal clock is straight-forward. Here's what I propose.
But why do our own slewing to match local "real" time, when it seems NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].
What do we do on platforms that don't have adjtime? The skewing algorithm is simple and adding it to the VM means we're not dependent on adjtime. Isn't that a good thing?
There seem to be a couple of choices to making clock_gettime() cross platform... a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb b. https://github.com/ThomasHabets/monotonic_clock
[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html
Sure. It's implementable. Why not add the simple algorithm to our VM and then we're done?
The Cog VMs use a heartbeat to control the rate at which the VM checks for
external input, delay expiry, etc. This is typically a 500Hz/2ms
heartbeat,
but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can
jump
forwards or backwards because of ntpd activity.
Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat
it
can compute a delta which it computes from the difference between local
and
monotonic times. On each heartbeat it sums the delta to compute an
offset
and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the
delta on
each heartbeat we can keep the VMs clock monotonic and have it drift
towards
local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.
Let's express this as a Smalltalk class. Units are microseconds (usecs).
Object subclass: VMClock instanceVariableNames: 'offset now'
initialize now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime
run "Compute now every 1ms such that now approaches localTime." [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue
computeNowGivenMonotonic: monotonicTime andLocal: localTime "Compute now such that now approaches localTime." | difference putativeNow delta | "Note now = (monotonicTime + offset) and we want (monotonicTime +
offset) =
localTime. delta is the ammount to change offset by in this tick." localTime = monotonicTime ifTrue: ["the two clocks agree; if an offset is in effect, reduce it by no more
than
5 usecs" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "localTime is behind; make the offset more
negative
by no more than 5 usecs" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "localTime is ahead; make the offset more
positive by
no more than 5 usecs" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now
I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation;
the
drift is exaggerated:
run "VMClockTester new run" "VMClockTester new run last: 20" "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock
localClock. now - localClock}]]. ^times contents
Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised
to
-13ms. At the perturbation localTime jumps to 44ms behind monotonic
time,
and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.
[snip]
Here's the last 20 entries. localTime is now 197ms ahead of monotonic
time,
and the offset reduces from 221ms to 202ms, and the difference between
now
and localTime reduces from 24ms to 5ms.
[snip]
Note that a clock that is accurate to within 1% is more than accurate
enough
for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation
introduces
similar pauses. So times can be affected by the odd millisecond anyway.
I'm sure the above is some standard piece of control theory but apart
from
one open university program I'll never forget which began with attempts
to
make a moving model tank target a moving object and ended with a sparrow
on
a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please
let
me know.
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the
right
invocations for measuring timespans, and not rely on a naive "let's get
the
time now and subtract it from the time later to get the difference",
because
they might end up with a timespan over "slow" or "fast" seconds.
No, that's exactly the point. With the above algorithm in use, (I
believe)
time is never more than 1% inaccurate, but converges on wall time, and
hence
the programmer can use one clock for both measurement and determining
wall
time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time
based
on a reputable clock.
Review appreciated.
cheers, Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
> On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de
wrote:
> > Hi Bert, > > this was just a regression. There has always been this check in the > past for > Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
> If you would have used VM or OS startup time, this would still be > problematic after an overflow. (Hence the comment about
snapshotting).
> So, > this fix does not directly address the discussion about synching > #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would
we
change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in
Time
class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more
complex
issue. Here's a discussion I'm having with Ryan Macnak, which covers
what
his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com
wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <
eliot.miranda@gmail.com>
wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote: > > On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda > eliot.miranda@gmail.com wrote:
Further back Ryan wrote:
>>> >>> 5) Travis found an assertion failure. Unfortunately the assertions >>> fail to include paths with the line numbers. >>> >>> >>> (newUtcMicrosecondClock >= utcMicrosecondClock 124) >> >> >> It's easy to track down. Just grep for the string. You'll find it >> in sqUnixHeartbeat.c. I've seen this from time to time, and have
yet to
>> understand it. What OS are you seeing this on? > > > Linux. Looking at the comment above this assert, I see Cog is using > the wrong clock. One should not rely on the realtime clock
(gettimeofday) to
> move steadily forward. It can jump around due to NTP syncs, the
machine
> sleeping or the user changing the time settings. Programs running
at startup
> on the Raspberry Pi in particular can see very large jumps because
it has no
> hardware clock (battery too expensive) so the first NTP sync will
be a very
> large correction. We fixed this in the Dart VM a few months ago.
Timers need
> to be scheduled using the monotonic clock (Linux clock_gettime, Mac > mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think
there
needs to be some clever averaging algorithm that has the property of
always
advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of
the
time it accesses both clock_gettime and gettimeofday and computes an
offset
that is some fraction of the delta between the current clock_gettime
and the
previous clock_gettime multiplied by the difference between the two
clocks.
So the VM time is always monotonic, but hunts towards wall time as
answered
by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works.
Or is
that the approach you've taken in Dart? Or are there standard
algorithms
out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs
make
both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit
more
complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards.
The logic
to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand
Delays
spaced apart by a minute, snapshot, move the system time forward a
day, then
resume, they remain evenly spaced.
This is because of #save/restoreResumptionTimes on image shutdown/startup.
If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which
is
what using the monotonic clock would fix.
Yes, since Delays are currently using gettimeofday() they expire when the system clock jumps. But also, with the move of Delay to a microsecond clock and removal of the clock-wrap checks, perhaps the algorithm is more susceptible to jitter or ntp moving the clock backwards. I would need to think more about that, but anyway clock_gettime(MONOTONIC) which slews wall-time seems a much better choice. I'd be interested in doing some work getting this into the VM since I'd like it for Pharo also.
cheers -ben
Yes, but there is another way. Delays can be implemented to function
as
durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they
are
durations, it is not.
2016-02-16 0:49 GMT+01:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de wrote:
Hi Bert,
this was just a regression. There has always been this check in the past for Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
If you would have used VM or OS startup time, this would still be
problematic after an overflow. (Hence the comment about snapshotting). So, this fix does not directly address the discussion about synching #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would we change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in Time class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more complex issue. Here's a discussion I'm having with Ryan Macnak, which covers what his team did with the Dart VM. Please read, it's interesting.
Ah, does this mean, there is a solution for this bug
11324 https://pharo.fogbugz.com/f/cases/11324/Image-freeze-when-changing-system-time Image freeze when changing system time I am sure there was a mantis bug entry as well, but I can not find it yet.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote:
On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda eliot.miranda@gmail.com
wrote:
Further back Ryan wrote:
- Travis found an assertion failure. Unfortunately the assertions fail
to include paths with the line numbers.
(newUtcMicrosecondClock >= utcMicrosecondClock 124)
It's easy to track down. Just grep for the string. You'll find it in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to understand it. What OS are you seeing this on?
Linux. Looking at the comment above this assert, I see Cog is using the wrong clock. One should not rely on the realtime clock (gettimeofday) to move steadily forward. It can jump around due to NTP syncs, the machine sleeping or the user changing the time settings. Programs running at startup on the Raspberry Pi in particular can see very large jumps because it has no hardware clock (battery too expensive) so the first NTP sync will be a very large correction. We fixed this in the Dart VM a few months ago. Timers need to be scheduled using the monotonic clock (Linux clock_gettime, Mac mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think there needs to be some clever averaging algorithm that has the property of always advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of the time it accesses both clock_gettime and gettimeofday and computes an offset that is some fraction of the delta between the current clock_gettime and the previous clock_gettime multiplied by the difference between the two clocks. So the VM time is always monotonic, but hunts towards wall time as answered by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works. Or is that the approach you've taken in Dart? Or are there standard algorithms out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs make both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit more complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the
monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic
clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards. The logic to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand Delays spaced apart by a minute, snapshot, move the system time forward a day, then resume, they remain evenly spaced. If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which is what using the monotonic clock would fix.
Yes, but there is another way. Delays can be implemented to function as durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they are durations, it is not.
_,,,^..^,,,_ best, Eliot
Currently this change also affects performance (down to 8-10% of the previous implementation), because of the creation of multiple LargeIntegers.
This is no longer an issue in 64-bits ;-). But even if answering large integers is slower it doesn't impact real applications since they spend little of their time in the delay & timing part of the code. But I'm sure that Nicolas & I can do something about large integer performance.
Levente
- Bert -
-- _,,,^..^,,,_ best, Eliot
On 15-02-2016, at 10:18 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On 15.02.2016, at 15:58, commits@source.squeak.org wrote:
Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.
What?!
Time millisecondClockValue is supposed to be continuous. I’ll have to admit I didn’t follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.
The only way I can see that bug occurring is if the primitive is fetching a time value that is ‘post-TZ’. And - not that I’m expert at all in the arcana of Windows - the code in platforms/win32/vm/sqWin32Time.c looks a bit suspicious somehow.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A flash of light, a cloud of dust, and... What was the question?
In VA Smalltalk "Time millisecondClockValue" answers an integer number of milliseconds from the time the OS was booted. I know this is the case under Windows and I think it is the same under Linux.
It should be continuous and ever increasing. It is a four byte integer and as such will wrap around to 0 after about 49 days. I have read where it may be adjusted for a change in daylight savings time but I don't think that is the case. It also doesn't seem to be effected by a change in the time of day clock.
Lou
On Mon, 15 Feb 2016 10:18:01 -0800, Bert Freudenberg bert@freudenbergs.de wrote:
On 15.02.2016, at 15:58, commits@source.squeak.org wrote:
Fixes a regression in Morphic's inter-cycle delay. Hacking during the switch from DST to normal time forced the user to wait one hour. Opening images from different time zones did also show this bug.
What?!
Time millisecondClockValue is supposed to be continuous. Ill have to admit I didnt follow the previous discussion too closely, but syncing millisecondClockValue to the wall clock seems like a very bad idea.
- Bert -
squeak-dev@lists.squeakfoundation.org