[squeak-dev] Re: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

Mon Jul 18 22:27:14 UTC 2016

On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman <btc at openinworld.com> wrote:

> Hi All,
>
> Did anything further come of this discussion...
>
> On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> > Hi Tim,
> >
> > On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <
> timfelgentreff at gmail.com>
> > wrote:
> >>
> >> Hi Eliot,
> >>
> >> am I understanding correctly that (one of) your ideas is to have just
> one
> >> clock that "drifts" towards wall clock time? How fast would it drift? I
> >> agree this would be neat for a local image, but what if someone from
> Japan
> >> sends me their image (happened to me just last week) and wants me to
> check
> >> something in it? Will the in-image clock run slow for a few hours until
> my
> >> timezone has caught up with Japan?
> >
> >
> > First, the timezone issue is completely separate, and is only an image
> > issue.  The VM's time basis is UTC microseconds since 1901 (see Time
> > class>>utcMicrosecondClock).  This should answer the same value at the
> same
> > time, within the bounds of clock drift, any where on the globe.  So if
> you
> > are sent an image from Japan nothing changes to times in that image when
> yu
> > start it locally, except for the inaccuracies between wall time (some
> atomic
> > clock somewhere) and your machine and the machine on which the image was
> > saved in Japan.
> >
> > As a convenience the VM also offers microseconds since 1901 in local time
> > (see Time class>>localMicrosecondClock).  This is suitable for deriving
> UI
> > times to display to the user, or times to write to a log, etc.  But it is
> > /not/ to be used to schedule delays, etc, etc.
> >
> > Second, yes, the idea is to have the VM drift time towards wall time.
> First
> > some back ground.
> >
> > By wall time I mean the UTC time as provided by some atomic clock, i.e.
> an
> > extremely accurate absolute time as is accessed by a network time
> protocol
> > daemon (ntpd) to keep one's machine's clock accurate.
> > Current hardware provide inaccurate clocks.  See e.g.
> >
> http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-the-time.html
> .
> > One can expect such hardware to drift relative to wall time by less than
> a
> > minute a day, but still drift by ammounts noticeable to humans over the
> > course of hours.
> > OS's such as Mac OS and linux provide several clocks, two of which are of
> > interest.  One is the computer's notion of wall time, typically provided
> by
> > gettimeofday.  I'm going to call this "local time".  This clock can run
> fast
> > or slow and can jump backwards.  It does this because the underlying
> clock
> > drifts relative to wall time (and I guess may drift faster or slower
> > depending on temperature) and periodically is corrected by ntpd.
> > The second notion of time is monotonic time which is a local clock (the
> > hardware's underlying clock) which is guaranteed to advance monotonically
> > but is not guaranteed to agree with local time, let alone wall time.  On
> > Unix this is provided by clock_gettime and on Mac OS X by
> mach_absolute_time
> > (see Ryan's message below).   I'm going to call this "monotonic time".
> >
> > The issue is that because local time jumps around it can't be used to
> > measure durations reliably.  If you're profiling something using local
> time
> > and half-way through something the ntpd adjusts local time then one's
> > measurements will be off.  One solution to this is to use monotonic time.
> > The problem is that this complicates the programmer model.  It is now up
> to
> > the programmer to understand the difference between local and monotonic
> > times and use the "right clock" in the "right circumstances".  This
> seems to
> > me to be against the Smalltalk approach, which is to provide a safe
> virtual
> > machine which protects the programmer from the vagaries and dangers of
> real
> > machines.
>
> The flip side of this is it restricts the tools available to a
> programmer at the Image level, and adds complexity to the VM that
> Image programmers can't see if ThingsGoWrong(TM).  If the Image had
> direct access to the several clocks provided by clock_gettime(), it
> would be possible for an Image level programmer to explore and
> *understand* the difference in them.  I know I could write a routine
> in C to explore this, but I'd prefer to do it in Smalltalk.
>
> Indeed for Delay scheduler, it might be good to choose between
> CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and
> observe what happens when I set the system time back a couple of days.
> Or maybe explore  CLOCK_PROCESS_CPUTIME_ID for benchmarking.
>
> > Further I think that providing a more ideal clock is
> > straight-forward.  Here's what I propose.
>
>
> But why do our own slewing to match local "real" time, when it seems
> NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].
>

What do we do on platforms that don't have adjtime?  The skewing algorithm
is simple and adding it to the VM means we're not dependent on adjtime.
Isn't that a good thing?

>
> There seem to be a couple of choices to making clock_gettime() cross
> platform...
> a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb
> b. https://github.com/ThomasHabets/monotonic_clock
>
> [1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html

Sure.  It's implementable.  Why not add the simple algorithm to our VM and
then we're done?

> The Cog VMs use a heartbeat to control the rate at which the VM checks for
> > external input, delay expiry, etc.  This is typically a 500Hz/2ms
> heartbeat,
> > but for the purposes of this discussion let's imagine its a 1Khz/1ms
> > heartbeat.  One thing that happens every heartbeat is that the VM updates
> > its notion of local time.  Currently the VM's notion of local time can
> jump
> > forwards or backwards because of ntpd activity.
> >
> > Current computer clocks are accurate to a few seconds a day, but let's
> > assume a pessimal accuracy of 1 second an hour.  That's an accuracy of
> > 1/3600, 0r 0.0277777%.  If the heartbeat accesses both local time (via
> > gettimeofday) and monotonic time (via clock_gettime) on every heartbeat
> it
> > can compute a delta which it computes from the difference between local
> and
> > monotonic times.  On each heartbeat it sums the delta to compute an
> offset
> > and applies this offset to monotonic time.  The VM then answers offset
> > monotonic time as the value of "wall time".  If we restrict the the
> delta on
> > each heartbeat we can keep the VMs clock monotonic and have it drift
> towards
> > local time, which itself is periodically corrected to wall time by ntpd.
> > If, for example, the delta is restricted to +/-5usecs when moving in one
> > direction and +/-10usecs when changing direction, the VM's notion of
> > monotonic time advances within 1% of actual monotonic time and approaches
> > local time "soon", since, because monotonic time is reasonably accurate a
> > rate of change of 1% soon overcomes a drift of at most 0.0277777%.
> >
> > Let's express this as a Smalltalk class.  Units are microseconds (usecs).
> >
> > Object subclass: VMClock instanceVariableNames: 'offset now'
> >
> > initialize
> > now := OperatingSystem gettimeofday.
> > offset := now - OperatingSystem clock_gettime
> >
> > run
> > "Compute now every 1ms such that now approaches localTime."
> > [(Delay forMilliseconds: 1) wait.
> > self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal:
> > OperatingSystem gettimeofday.
> > true] whileTrue
> >
> > computeNowGivenMonotonic: monotonicTime andLocal: localTime
> > "Compute now such that now approaches localTime."
> > | difference putativeNow delta |
> > "Note now = (monotonicTime + offset) and we want (monotonicTime +
> offset) =
> > localTime.
> > delta is the ammount to change offset by in this tick."
> > localTime = monotonicTime
> > ifTrue:
> > ["the two clocks agree; if an offset is in effect, reduce it by no more
> than
> > 5 usecs"
> > delta := offset >= 0
> > ifTrue: [(offset min: 5) negated]
> > ifFalse: [(offset max: -5) negated]]
> > ifFalse:
> > [putativeNow := monotonicTime + offset.
> > difference := localTime - putativeNow.
> > delta := 0.
> > difference < 0 ifTrue: "localTime is behind; make the offset more
> negative
> > by no more than 5 usecs"
> > [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])].
> > difference > 0 ifTrue: "localTime is ahead; make the offset more
> positive by
> > no more than 5 usecs"
> > [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]].
> > offset := offset + delta.
> > now := monotonicTime + offset.
> > ^now
> >
> > I've written and attached a little test program that randomly offsets
> > localTime every "3.6 seconds" in a simulation.  Here's the simulation;
> the
> > drift is exaggerated:
> >
> > run
> > "VMClockTester new run"
> > "VMClockTester new run last: 20"
> > "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50"
> > | times |
> > times := (Array new: 1000) writeStream.
> > 1 to: 1000000 do: "1 million ticks = 1000 seconds"
> > [:i|
> > monotonicClock := monotonicClock + 1000.
> > localClock := localClock + 1000 + (i \\ 3600 = 0
> > ifTrue: [| drift |
> > [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue.
> > drift]
> > ifFalse: [0]).
> > self computeNowGivenMonotonic: monotonicClock andLocal: localClock.
> > i \\ 100 = 0 ifTrue:
> > [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock
> -
> > localClock. now - localClock}]].
> > ^times contents
> >
> > Here's what happens on the third perturbation.  Before the perturbation
> > localTime is behind monotonicTime by 13ms, and the offset has stabilised
> to
> > -13ms.  At the perturbation localTime jumps to 44ms behind monotonic
> time,
> > and over successive iterations the offset increases (negatively) and the
> > error is reduced from 30ms to 13ms.
> [snip]
> >
> > Here's the last 20 entries.  localTime is now 197ms ahead of monotonic
> time,
> > and the offset reduces from 221ms to 202ms, and the difference between
> now
> > and localTime reduces from 24ms to 5ms.
> [snip]
> >
> > Note that a clock that is accurate to within 1% is more than accurate
> enough
> > for good time measurements.  the VM's GC introduces pauses of around 1ms
> > even on fast hardware, and the occasional code zone reclamation
> introduces
> > similar pauses.  So times can be affected by the odd millisecond anyway.
> >
> > I'm sure the above is some standard piece of control theory but apart
> from
> > one open university program I'll never forget which began with attempts
> to
> > make a moving model tank target a moving object and ended with a sparrow
> on
> > a wind-blown branch keeping its head perfectly stationary, I've never
> > studied it.  Anyone who knows what the above algorithm's known as please
> let
> > me know.
> >
> >> Also, this sounds like a lot of logic to have in the VM and a source of
> >> possible confusion for the user - they would have to know to use the
> right
> >> invocations for measuring timespans, and not rely on a naive "let's get
> the
> >> time now and subtract it from the time later to get the difference",
> because
> >> they might end up with a timespan over "slow" or "fast" seconds.
> >
> >
> > No, that's exactly the point.  With the above algorithm in use, (I
> believe)
> > time is never more than 1% inaccurate, but converges on wall time, and
> hence
> > the programmer can use one clock for both measurement and determining
> wall
> > time.  The only assumptions are that computer clocks are accurate to more
> > than 1% and that the ntpd daemon adjusts the local notion of wall time
> based
> > on a reputable clock.
> >
> >
> > Review appreciated.
> >
> >> cheers,
> >> Tim
> >>
> >> On 16 February 2016 at 00:49, Eliot Miranda <eliot.miranda at gmail.com>
> >> wrote:
> >>>
> >>> Hi Levente, Hi Bert, Hi All,
> >>>
> >>> On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi <leves at caesar.elte.hu>
> >>> wrote:
> >>>>
> >>>> On Mon, 15 Feb 2016, Bert Freudenberg wrote:
> >>>>
> >>>>>
> >>>>>> On 15.02.2016, at 10:17, marcel.taeumel <Marcel.Taeumel at hpi.de>
> wrote:
> >>>>>>
> >>>>>> Hi Bert,
> >>>>>>
> >>>>>> this was just a regression. There has always been this check in the
> >>>>>> past for
> >>>>>> Morphic projects and still today for MVC projects.
> >>>>>
> >>>>>
> >>>>> Ah, so we lost the check at some point?
> >>>>>
> >>>>>> If you would have used VM or OS startup time, this would still be
> >>>>>> problematic after an overflow. (Hence the comment about
> snapshotting).
> >>>>>> So,
> >>>>>> this fix does not directly address the discussion about synching
> >>>>>> #millisecondClockValue to wall clock.
> >>>>>
> >>>>>
> >>>>> I still think it should answer milliseconds since startup. Why would
> we
> >>>>> change that?
> >>>>
> >>>>
> >>>> Eliot changed it recently. Probably to avoid the rollover issues. The
> >>>> correct fix would be to use to UTC clock instead of the local one in
> Time
> >>>> class >> #millisecondClockValue.
> >>>
> >>>
> >>> I changed it for simplicity.  Alas it turns out to be a much more
> complex
> >>> issue.  Here's a discussion I'm having with Ryan Macnak, which covers
> what
> >>> his team did with the Dart VM.  Please read, it's interesting.
> >>>
> >>>
> >>>  On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak <rmacnak at gmail.com>
> wrote:
> >>>>
> >>>> On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <
> eliot.miranda at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi Ryan,
> >>>>>
> >>>>>
> >>>>> On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak <rmacnak at gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda
> >>>>>> <eliot.miranda at gmail.com> wrote:
> >>>>>
> >>>>>     Further back Ryan wrote:
> >>>>>>>>
> >>>>>>>> 5) Travis found an assertion failure. Unfortunately the assertions
> >>>>>>>> fail to include paths with the line numbers.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> (newUtcMicrosecondClock >= utcMicrosecondClock 124)
> >>>>>>>
> >>>>>>>
> >>>>>>> It's easy to track down.  Just grep for the string.  You'll find it
> >>>>>>> in sqUnixHeartbeat.c.  I've seen this from time to time, and have
> yet to
> >>>>>>> understand it. What OS are you seeing this on?
> >>>>>>
> >>>>>>
> >>>>>> Linux. Looking at the comment above this assert, I see Cog is using
> >>>>>> the wrong clock. One should not rely on the realtime clock
> (gettimeofday) to
> >>>>>> move steadily forward. It can jump around due to NTP syncs, the
> machine
> >>>>>> sleeping or the user changing the time settings. Programs running
> at startup
> >>>>>> on the Raspberry Pi in particular can see very large jumps because
> it has no
> >>>>>> hardware clock (battery too expensive) so the first NTP sync will
> be a very
> >>>>>> large correction. We fixed this in the Dart VM a few months ago.
> Timers need
> >>>>>> to be scheduled using the monotonic clock (Linux clock_gettime, Mac
> >>>>>> mach_absolute_time).
> >>>>>
> >>>>>
> >>>>> Yes, this isn't satisfactory either.  One needs the VM to answer
> >>>>> something that is close to wall time, not drift over time.  I think
> there
> >>>>> needs to be some clever averaging algorithm that has the property of
> always
> >>>>> advancing the clock but trying to converge on wall time.
> >>>>>
> >>>>>
> >>>>> One can imagine on every occasion that the VM updates its notion of
> the
> >>>>> time it accesses both clock_gettime and gettimeofday and computes an
> offset
> >>>>> that is some fraction of the delta between the current clock_gettime
> and the
> >>>>> previous clock_gettime multiplied by the difference between the two
> clocks.
> >>>>> So the VM time is always monotonic, but hunts towards wall time as
> answered
> >>>>> by gettimeofday.
> >>>>>
> >>>>>
> >>>>> Thanks. I was unaware of clock_gettime & mach_absolute_time.  Given
> >>>>> these two it shouldn;t be too hard to concoct something that works.
> Or is
> >>>>> that the approach you've taken in Dart?  Or are there standard
> algorithms
> >>>>> out there?  I'll take a look.
> >>>>
> >>>>
> >>>> I'm not seeing why it needs to be close to wall time. The VM needs
> make
> >>>> both a wall clock and a monotonic clock available to the image.
>
>
>
> >>>
> >>>
> >>> That's one way, but it's complex.  I think having a clock that is
> >>> flexible, that will deviate by no more than a specified percentage from
> >>> clock_gettime in approaching wall time is simpler for the user albeit
> more
> >>> complex for the VM implementor.  It therefore seems to me to be in the
> >>> Smalltalk tradition.
> >>>
> >>>> In Dart, there are three uses of time
> >>>>
> >>>> Stopwatch measures durations (BlockClosure timeToRun). It uses the
> >>>> monotonic clock.
> >>>>
> >>>> Timer schedules a future notification (Delay wait). It uses the
> >>>> monotonic clock.
> >>>>
> >>>> DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
> >>>
> >>>
> >>> Makes sense, at the cost of having two clocks.
> >>>
> >>>>
> >>>> Smalltalk has the additional complication of handling in-flight Delays
> >>>> or timeToRuns as an image moves across processes. There will be a
> >>>> discontinuity in both clocks, and both of them can move backwards.
> The logic
> >>>> to deal with the discontinuity must already exist for Delays, though I
> >>>> suspect no one has bothered for timeToRun. If I create a thousand
> Delays
> >>>> spaced apart by a minute, snapshot, move the system time forward a
> day, then
> >>>> resume, they remain evenly spaced.
>
> This is because of #save/restoreResumptionTimes on image shutdown/startup.
>
> >>>> If I do this while the image is still
> >>>> running, they all fire at once and the VM becomes unresponsive, which
> is
> >>>> what using the monotonic clock would fix.
>
> Yes, since Delays are currently using gettimeofday() they expire when
> the system clock jumps.  But also, with the move of Delay to a
> microsecond clock and removal of the clock-wrap checks, perhaps the
> algorithm is more susceptible to jitter or ntp moving the clock
> backwards.  I would need to think more about that, but anyway
> clock_gettime(MONOTONIC) which slews wall-time seems a much better
> choice.  I'd be interested in doing some work getting this into the VM
> since I'd like it for Pharo also.
>
> cheers -ben
>
> >>>
> >>>
> >>> Yes, but there is another way.  Delays can be implemented to function
> as
> >>> durations, not deadlines.  This is orthogonal to clocks.  If Delays are
> >>> deadlines then it is correct that on start-up they all fire.  If they
> are
> >>> durations, it is not.
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20160718/e74c055c/attachment-0001.htm