[squeak-dev] Re: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

Mon Jul 18 18:02:09 UTC 2016

Hi All,

Did anything further come of this discussion...

On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Hi Tim,
>
> On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <timfelgentreff at gmail.com>
> wrote:
>>
>> Hi Eliot,
>>
>> am I understanding correctly that (one of) your ideas is to have just one
>> clock that "drifts" towards wall clock time? How fast would it drift? I
>> agree this would be neat for a local image, but what if someone from Japan
>> sends me their image (happened to me just last week) and wants me to check
>> something in it? Will the in-image clock run slow for a few hours until my
>> timezone has caught up with Japan?
>
>
> First, the timezone issue is completely separate, and is only an image
> issue.  The VM's time basis is UTC microseconds since 1901 (see Time
> class>>utcMicrosecondClock).  This should answer the same value at the same
> time, within the bounds of clock drift, any where on the globe.  So if you
> are sent an image from Japan nothing changes to times in that image when yu
> start it locally, except for the inaccuracies between wall time (some atomic
> clock somewhere) and your machine and the machine on which the image was
> saved in Japan.
>
> As a convenience the VM also offers microseconds since 1901 in local time
> (see Time class>>localMicrosecondClock).  This is suitable for deriving UI
> times to display to the user, or times to write to a log, etc.  But it is
> /not/ to be used to schedule delays, etc, etc.
>
> Second, yes, the idea is to have the VM drift time towards wall time.  First
> some back ground.
>
> By wall time I mean the UTC time as provided by some atomic clock, i.e. an
> extremely accurate absolute time as is accessed by a network time protocol
> daemon (ntpd) to keep one's machine's clock accurate.
> Current hardware provide inaccurate clocks.  See e.g.
> http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-the-time.html.
> One can expect such hardware to drift relative to wall time by less than a
> minute a day, but still drift by ammounts noticeable to humans over the
> course of hours.
> OS's such as Mac OS and linux provide several clocks, two of which are of
> interest.  One is the computer's notion of wall time, typically provided by
> gettimeofday.  I'm going to call this "local time".  This clock can run fast
> or slow and can jump backwards.  It does this because the underlying clock
> drifts relative to wall time (and I guess may drift faster or slower
> depending on temperature) and periodically is corrected by ntpd.
> The second notion of time is monotonic time which is a local clock (the
> hardware's underlying clock) which is guaranteed to advance monotonically
> but is not guaranteed to agree with local time, let alone wall time.  On
> Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time
> (see Ryan's message below).   I'm going to call this "monotonic time".
>
> The issue is that because local time jumps around it can't be used to
> measure durations reliably.  If you're profiling something using local time
> and half-way through something the ntpd adjusts local time then one's
> measurements will be off.  One solution to this is to use monotonic time.
> The problem is that this complicates the programmer model.  It is now up to
> the programmer to understand the difference between local and monotonic
> times and use the "right clock" in the "right circumstances".  This seems to
> me to be against the Smalltalk approach, which is to provide a safe virtual
> machine which protects the programmer from the vagaries and dangers of real
> machines.

The flip side of this is it restricts the tools available to a
programmer at the Image level, and adds complexity to the VM that
Image programmers can't see if ThingsGoWrong(TM).  If the Image had
direct access to the several clocks provided by clock_gettime(), it
would be possible for an Image level programmer to explore and
*understand* the difference in them.  I know I could write a routine
in C to explore this, but I'd prefer to do it in Smalltalk.

Indeed for Delay scheduler, it might be good to choose between
CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and
observe what happens when I set the system time back a couple of days.
Or maybe explore  CLOCK_PROCESS_CPUTIME_ID for benchmarking.

> Further I think that providing a more ideal clock is
> straight-forward.  Here's what I propose.

But why do our own slewing to match local "real" time, when it seems
NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].

There seem to be a couple of choices to making clock_gettime() cross platform...
a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb
b. https://github.com/ThomasHabets/monotonic_clock

[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html

>
>
> The Cog VMs use a heartbeat to control the rate at which the VM checks for
> external input, delay expiry, etc.  This is typically a 500Hz/2ms heartbeat,
> but for the purposes of this discussion let's imagine its a 1Khz/1ms
> heartbeat.  One thing that happens every heartbeat is that the VM updates
> its notion of local time.  Currently the VM's notion of local time can jump
> forwards or backwards because of ntpd activity.
>
> Current computer clocks are accurate to a few seconds a day, but let's
> assume a pessimal accuracy of 1 second an hour.  That's an accuracy of
> 1/3600, 0r 0.0277777%.  If the heartbeat accesses both local time (via
> gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it
> can compute a delta which it computes from the difference between local and
> monotonic times.  On each heartbeat it sums the delta to compute an offset
> and applies this offset to monotonic time.  The VM then answers offset
> monotonic time as the value of "wall time".  If we restrict the the delta on
> each heartbeat we can keep the VMs clock monotonic and have it drift towards
> local time, which itself is periodically corrected to wall time by ntpd.
> If, for example, the delta is restricted to +/-5usecs when moving in one
> direction and +/-10usecs when changing direction, the VM's notion of
> monotonic time advances within 1% of actual monotonic time and approaches
> local time "soon", since, because monotonic time is reasonably accurate a
> rate of change of 1% soon overcomes a drift of at most 0.0277777%.
>
> Let's express this as a Smalltalk class.  Units are microseconds (usecs).
>
> Object subclass: VMClock instanceVariableNames: 'offset now'
>
> initialize
> now := OperatingSystem gettimeofday.
> offset := now - OperatingSystem clock_gettime
>
> run
> "Compute now every 1ms such that now approaches localTime."
> [(Delay forMilliseconds: 1) wait.
> self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal:
> OperatingSystem gettimeofday.
> true] whileTrue
>
> computeNowGivenMonotonic: monotonicTime andLocal: localTime
> "Compute now such that now approaches localTime."
> | difference putativeNow delta |
> "Note now = (monotonicTime + offset) and we want (monotonicTime + offset) =
> localTime.
> delta is the ammount to change offset by in this tick."
> localTime = monotonicTime
> ifTrue:
> ["the two clocks agree; if an offset is in effect, reduce it by no more than
> 5 usecs"
> delta := offset >= 0
> ifTrue: [(offset min: 5) negated]
> ifFalse: [(offset max: -5) negated]]
> ifFalse:
> [putativeNow := monotonicTime + offset.
> difference := localTime - putativeNow.
> delta := 0.
> difference < 0 ifTrue: "localTime is behind; make the offset more negative
> by no more than 5 usecs"
> [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])].
> difference > 0 ifTrue: "localTime is ahead; make the offset more positive by
> no more than 5 usecs"
> [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]].
> offset := offset + delta.
> now := monotonicTime + offset.
> ^now
>
> I've written and attached a little test program that randomly offsets
> localTime every "3.6 seconds" in a simulation.  Here's the simulation; the
> drift is exaggerated:
>
> run
> "VMClockTester new run"
> "VMClockTester new run last: 20"
> "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50"
> | times |
> times := (Array new: 1000) writeStream.
> 1 to: 1000000 do: "1 million ticks = 1000 seconds"
> [:i|
> monotonicClock := monotonicClock + 1000.
> localClock := localClock + 1000 + (i \\ 3600 = 0
> ifTrue: [| drift |
> [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue.
> drift]
> ifFalse: [0]).
> self computeNowGivenMonotonic: monotonicClock andLocal: localClock.
> i \\ 100 = 0 ifTrue:
> [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock -
> localClock. now - localClock}]].
> ^times contents
>
> Here's what happens on the third perturbation.  Before the perturbation
> localTime is behind monotonicTime by 13ms, and the offset has stabilised to
> -13ms.  At the perturbation localTime jumps to 44ms behind monotonic time,
> and over successive iterations the offset increases (negatively) and the
> error is reduced from 30ms to 13ms.
[snip]
>
> Here's the last 20 entries.  localTime is now 197ms ahead of monotonic time,
> and the offset reduces from 221ms to 202ms, and the difference between now
> and localTime reduces from 24ms to 5ms.
[snip]
>
> Note that a clock that is accurate to within 1% is more than accurate enough
> for good time measurements.  the VM's GC introduces pauses of around 1ms
> even on fast hardware, and the occasional code zone reclamation introduces
> similar pauses.  So times can be affected by the odd millisecond anyway.
>
> I'm sure the above is some standard piece of control theory but apart from
> one open university program I'll never forget which began with attempts to
> make a moving model tank target a moving object and ended with a sparrow on
> a wind-blown branch keeping its head perfectly stationary, I've never
> studied it.  Anyone who knows what the above algorithm's known as please let
> me know.
>
>> Also, this sounds like a lot of logic to have in the VM and a source of
>> possible confusion for the user - they would have to know to use the right
>> invocations for measuring timespans, and not rely on a naive "let's get the
>> time now and subtract it from the time later to get the difference", because
>> they might end up with a timespan over "slow" or "fast" seconds.
>
>
> No, that's exactly the point.  With the above algorithm in use, (I believe)
> time is never more than 1% inaccurate, but converges on wall time, and hence
> the programmer can use one clock for both measurement and determining wall
> time.  The only assumptions are that computer clocks are accurate to more
> than 1% and that the ntpd daemon adjusts the local notion of wall time based
> on a reputable clock.
>
>
> Review appreciated.
>
>> cheers,
>> Tim
>>
>> On 16 February 2016 at 00:49, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>>
>>> Hi Levente, Hi Bert, Hi All,
>>>
>>> On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi <leves at caesar.elte.hu>
>>> wrote:
>>>>
>>>> On Mon, 15 Feb 2016, Bert Freudenberg wrote:
>>>>
>>>>>
>>>>>> On 15.02.2016, at 10:17, marcel.taeumel <Marcel.Taeumel at hpi.de> wrote:
>>>>>>
>>>>>> Hi Bert,
>>>>>>
>>>>>> this was just a regression. There has always been this check in the
>>>>>> past for
>>>>>> Morphic projects and still today for MVC projects.
>>>>>
>>>>>
>>>>> Ah, so we lost the check at some point?
>>>>>
>>>>>> If you would have used VM or OS startup time, this would still be
>>>>>> problematic after an overflow. (Hence the comment about snapshotting).
>>>>>> So,
>>>>>> this fix does not directly address the discussion about synching
>>>>>> #millisecondClockValue to wall clock.
>>>>>
>>>>>
>>>>> I still think it should answer milliseconds since startup. Why would we
>>>>> change that?
>>>>
>>>>
>>>> Eliot changed it recently. Probably to avoid the rollover issues. The
>>>> correct fix would be to use to UTC clock instead of the local one in Time
>>>> class >> #millisecondClockValue.
>>>
>>>
>>> I changed it for simplicity.  Alas it turns out to be a much more complex
>>> issue.  Here's a discussion I'm having with Ryan Macnak, which covers what
>>> his team did with the Dart VM.  Please read, it's interesting.
>>>
>>>
>>>  On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak <rmacnak at gmail.com> wrote:
>>>>
>>>> On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <eliot.miranda at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Ryan,
>>>>>
>>>>>
>>>>> On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak <rmacnak at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda
>>>>>> <eliot.miranda at gmail.com> wrote:
>>>>>
>>>>>     Further back Ryan wrote:
>>>>>>>>
>>>>>>>> 5) Travis found an assertion failure. Unfortunately the assertions
>>>>>>>> fail to include paths with the line numbers.
>>>>>>>>
>>>>>>>>
>>>>>>>> (newUtcMicrosecondClock >= utcMicrosecondClock 124)
>>>>>>>
>>>>>>>
>>>>>>> It's easy to track down.  Just grep for the string.  You'll find it
>>>>>>> in sqUnixHeartbeat.c.  I've seen this from time to time, and have yet to
>>>>>>> understand it. What OS are you seeing this on?
>>>>>>
>>>>>>
>>>>>> Linux. Looking at the comment above this assert, I see Cog is using
>>>>>> the wrong clock. One should not rely on the realtime clock (gettimeofday) to
>>>>>> move steadily forward. It can jump around due to NTP syncs, the machine
>>>>>> sleeping or the user changing the time settings. Programs running at startup
>>>>>> on the Raspberry Pi in particular can see very large jumps because it has no
>>>>>> hardware clock (battery too expensive) so the first NTP sync will be a very
>>>>>> large correction. We fixed this in the Dart VM a few months ago. Timers need
>>>>>> to be scheduled using the monotonic clock (Linux clock_gettime, Mac
>>>>>> mach_absolute_time).
>>>>>
>>>>>
>>>>> Yes, this isn't satisfactory either.  One needs the VM to answer
>>>>> something that is close to wall time, not drift over time.  I think there
>>>>> needs to be some clever averaging algorithm that has the property of always
>>>>> advancing the clock but trying to converge on wall time.
>>>>>
>>>>>
>>>>> One can imagine on every occasion that the VM updates its notion of the
>>>>> time it accesses both clock_gettime and gettimeofday and computes an offset
>>>>> that is some fraction of the delta between the current clock_gettime and the
>>>>> previous clock_gettime multiplied by the difference between the two clocks.
>>>>> So the VM time is always monotonic, but hunts towards wall time as answered
>>>>> by gettimeofday.
>>>>>
>>>>>
>>>>> Thanks. I was unaware of clock_gettime & mach_absolute_time.  Given
>>>>> these two it shouldn;t be too hard to concoct something that works.  Or is
>>>>> that the approach you've taken in Dart?  Or are there standard algorithms
>>>>> out there?  I'll take a look.
>>>>
>>>>
>>>> I'm not seeing why it needs to be close to wall time. The VM needs make
>>>> both a wall clock and a monotonic clock available to the image.

>>>
>>>
>>> That's one way, but it's complex.  I think having a clock that is
>>> flexible, that will deviate by no more than a specified percentage from
>>> clock_gettime in approaching wall time is simpler for the user albeit more
>>> complex for the VM implementor.  It therefore seems to me to be in the
>>> Smalltalk tradition.
>>>
>>>> In Dart, there are three uses of time
>>>>
>>>> Stopwatch measures durations (BlockClosure timeToRun). It uses the
>>>> monotonic clock.
>>>>
>>>> Timer schedules a future notification (Delay wait). It uses the
>>>> monotonic clock.
>>>>
>>>> DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
>>>
>>>
>>> Makes sense, at the cost of having two clocks.
>>>
>>>>
>>>> Smalltalk has the additional complication of handling in-flight Delays
>>>> or timeToRuns as an image moves across processes. There will be a
>>>> discontinuity in both clocks, and both of them can move backwards. The logic
>>>> to deal with the discontinuity must already exist for Delays, though I
>>>> suspect no one has bothered for timeToRun. If I create a thousand Delays
>>>> spaced apart by a minute, snapshot, move the system time forward a day, then
>>>> resume, they remain evenly spaced.

This is because of #save/restoreResumptionTimes on image shutdown/startup.

>>>> If I do this while the image is still
>>>> running, they all fire at once and the VM becomes unresponsive, which is
>>>> what using the monotonic clock would fix.

Yes, since Delays are currently using gettimeofday() they expire when
the system clock jumps.  But also, with the move of Delay to a
microsecond clock and removal of the clock-wrap checks, perhaps the
algorithm is more susceptible to jitter or ntp moving the clock
backwards.  I would need to think more about that, but anyway
clock_gettime(MONOTONIC) which slews wall-time seems a much better
choice.  I'd be interested in doing some work getting this into the VM
since I'd like it for Pharo also.

cheers -ben

>>>
>>>
>>> Yes, but there is another way.  Delays can be implemented to function as
>>> durations, not deadlines.  This is orthogonal to clocks.  If Delays are
>>> deadlines then it is correct that on start-up they all fire.  If they are
>>> durations, it is not.