Re: [squeak-dev] Re: Time millisecondClockValue (was: The Trunk: Morphic-mt.1080.mcz)

19 Jul 2016


      On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:
...
Hi All,
Did anything further come of this discussion...
On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
...
Hi Tim,
On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <
timfelgentreff@gmail.com>
...
wrote:
...
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just
one
...
...
clock that "drifts" towards wall clock time? How fast would it drift? I
agree this would be neat for a local image, but what if someone from
Japan
...
...
sends me their image (happened to me just last week) and wants me to
check
...
...
something in it? Will the in-image clock run slow for a few hours until
my
...
...
timezone has caught up with Japan?
First, the timezone issue is completely separate, and is only an image
issue.  The VM's time basis is UTC microseconds since 1901 (see Time
class>>utcMicrosecondClock).  This should answer the same value at the
same
...
time, within the bounds of clock drift, any where on the globe.  So if
you
...
are sent an image from Japan nothing changes to times in that image when
yu
...
start it locally, except for the inaccuracies between wall time (some
atomic
...
clock somewhere) and your machine and the machine on which the image was
saved in Japan.
As a convenience the VM also offers microseconds since 1901 in local time
(see Time class>>localMicrosecondClock).  This is suitable for deriving
UI
...
times to display to the user, or times to write to a log, etc.  But it is
/not/ to be used to schedule delays, etc, etc.
Second, yes, the idea is to have the VM drift time towards wall time.
First
...
some back ground.
By wall time I mean the UTC time as provided by some atomic clock, i.e.
an
...
extremely accurate absolute time as is accessed by a network time
protocol
...
daemon (ntpd) to keep one's machine's clock accurate.
Current hardware provide inaccurate clocks.  See e.g.
http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-...
.
...
One can expect such hardware to drift relative to wall time by less than
a
...
minute a day, but still drift by ammounts noticeable to humans over the
course of hours.
OS's such as Mac OS and linux provide several clocks, two of which are of
interest.  One is the computer's notion of wall time, typically provided
by
...
gettimeofday.  I'm going to call this "local time".  This clock can run
fast
...
or slow and can jump backwards.  It does this because the underlying
clock
...
drifts relative to wall time (and I guess may drift faster or slower
depending on temperature) and periodically is corrected by ntpd.
The second notion of time is monotonic time which is a local clock (the
hardware's underlying clock) which is guaranteed to advance monotonically
but is not guaranteed to agree with local time, let alone wall time.  On
Unix this is provided by clock_gettime and on Mac OS X by
mach_absolute_time
...
(see Ryan's message below).   I'm going to call this "monotonic time".
The issue is that because local time jumps around it can't be used to
measure durations reliably.  If you're profiling something using local
time
...
and half-way through something the ntpd adjusts local time then one's
measurements will be off.  One solution to this is to use monotonic time.
The problem is that this complicates the programmer model.  It is now up
to
...
the programmer to understand the difference between local and monotonic
times and use the "right clock" in the "right circumstances".  This
seems to
...
me to be against the Smalltalk approach, which is to provide a safe
virtual
...
machine which protects the programmer from the vagaries and dangers of
real
...
machines.
The flip side of this is it restricts the tools available to a
programmer at the Image level, and adds complexity to the VM that
Image programmers can't see if ThingsGoWrong(TM).  If the Image had
direct access to the several clocks provided by clock_gettime(), it
would be possible for an Image level programmer to explore and
*understand* the difference in them.  I know I could write a routine
in C to explore this, but I'd prefer to do it in Smalltalk.
Indeed for Delay scheduler, it might be good to choose between
CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and
observe what happens when I set the system time back a couple of days.
Or maybe explore  CLOCK_PROCESS_CPUTIME_ID for benchmarking.
...
Further I think that providing a more ideal clock is
straight-forward.  Here's what I propose.
But why do our own slewing to match local "real" time, when it seems
NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].
What do we do on platforms that don't have adjtime?  The skewing algorithm
is simple and adding it to the VM means we're not dependent on adjtime.
Isn't that a good thing?
...
There seem to be a couple of choices to making clock_gettime() cross
platform...
a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb
b. https://github.com/ThomasHabets/monotonic_clock
[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html
Sure.  It's implementable.  Why not add the simple algorithm to our VM and
then we're done?
...
The Cog VMs use a heartbeat to control the rate at which the VM checks for
...
external input, delay expiry, etc.  This is typically a 500Hz/2ms
heartbeat,
...
but for the purposes of this discussion let's imagine its a 1Khz/1ms
heartbeat.  One thing that happens every heartbeat is that the VM updates
its notion of local time.  Currently the VM's notion of local time can
jump
...
forwards or backwards because of ntpd activity.
Current computer clocks are accurate to a few seconds a day, but let's
assume a pessimal accuracy of 1 second an hour.  That's an accuracy of
1/3600, 0r 0.0277777%.  If the heartbeat accesses both local time (via
gettimeofday) and monotonic time (via clock_gettime) on every heartbeat
it
...
can compute a delta which it computes from the difference between local
and
...
monotonic times.  On each heartbeat it sums the delta to compute an
offset
...
and applies this offset to monotonic time.  The VM then answers offset
monotonic time as the value of "wall time".  If we restrict the the
delta on
...
each heartbeat we can keep the VMs clock monotonic and have it drift
towards
...
local time, which itself is periodically corrected to wall time by ntpd.
If, for example, the delta is restricted to +/-5usecs when moving in one
direction and +/-10usecs when changing direction, the VM's notion of
monotonic time advances within 1% of actual monotonic time and approaches
local time "soon", since, because monotonic time is reasonably accurate a
rate of change of 1% soon overcomes a drift of at most 0.0277777%.
Let's express this as a Smalltalk class.  Units are microseconds (usecs).
Object subclass: VMClock instanceVariableNames: 'offset now'
initialize
now := OperatingSystem gettimeofday.
offset := now - OperatingSystem clock_gettime
run
"Compute now every 1ms such that now approaches localTime."
[(Delay forMilliseconds: 1) wait.
self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal:
OperatingSystem gettimeofday.
true] whileTrue
computeNowGivenMonotonic: monotonicTime andLocal: localTime
"Compute now such that now approaches localTime."
| difference putativeNow delta |
"Note now = (monotonicTime + offset) and we want (monotonicTime +
offset) =
...
localTime.
delta is the ammount to change offset by in this tick."
localTime = monotonicTime
ifTrue:
["the two clocks agree; if an offset is in effect, reduce it by no more
than
...
5 usecs"
delta := offset >= 0
ifTrue: [(offset min: 5) negated]
ifFalse: [(offset max: -5) negated]]
ifFalse:
[putativeNow := monotonicTime + offset.
difference := localTime - putativeNow.
delta := 0.
difference < 0 ifTrue: "localTime is behind; make the offset more
negative
...
by no more than 5 usecs"
[delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])].
difference > 0 ifTrue: "localTime is ahead; make the offset more
positive by
...
no more than 5 usecs"
[delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]].
offset := offset + delta.
now := monotonicTime + offset.
^now
I've written and attached a little test program that randomly offsets
localTime every "3.6 seconds" in a simulation.  Here's the simulation;
the
...
drift is exaggerated:
run
"VMClockTester new run"
"VMClockTester new run last: 20"
"VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50"
| times |
times := (Array new: 1000) writeStream.
1 to: 1000000 do: "1 million ticks = 1000 seconds"
[:i|
monotonicClock := monotonicClock + 1000.
localClock := localClock + 1000 + (i \ 3600 = 0
ifTrue: [| drift |
[(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue.
drift]
ifFalse: [0]).
self computeNowGivenMonotonic: monotonicClock andLocal: localClock.
i \ 100 = 0 ifTrue:
[times nextPut: {now. offset. monotonicClock. localClock. monotonicClock


...
localClock. now - localClock}]].
^times contents
Here's what happens on the third perturbation.  Before the perturbation
localTime is behind monotonicTime by 13ms, and the offset has stabilised
to
...
-13ms.  At the perturbation localTime jumps to 44ms behind monotonic
time,
...
and over successive iterations the offset increases (negatively) and the
error is reduced from 30ms to 13ms.
[snip]
...
Here's the last 20 entries.  localTime is now 197ms ahead of monotonic
time,
...
and the offset reduces from 221ms to 202ms, and the difference between
now
...
and localTime reduces from 24ms to 5ms.
[snip]
...
Note that a clock that is accurate to within 1% is more than accurate
enough
...
for good time measurements.  the VM's GC introduces pauses of around 1ms
even on fast hardware, and the occasional code zone reclamation
introduces
...
similar pauses.  So times can be affected by the odd millisecond anyway.
I'm sure the above is some standard piece of control theory but apart
from
...
one open university program I'll never forget which began with attempts
to
...
make a moving model tank target a moving object and ended with a sparrow
on
...
a wind-blown branch keeping its head perfectly stationary, I've never
studied it.  Anyone who knows what the above algorithm's known as please
let
...
me know.
...
Also, this sounds like a lot of logic to have in the VM and a source of
possible confusion for the user - they would have to know to use the
right
...
...
invocations for measuring timespans, and not rely on a naive "let's get
the
...
...
time now and subtract it from the time later to get the difference",
because
...
...
they might end up with a timespan over "slow" or "fast" seconds.
No, that's exactly the point.  With the above algorithm in use, (I
believe)
...
time is never more than 1% inaccurate, but converges on wall time, and
hence
...
the programmer can use one clock for both measurement and determining
wall
...
time.  The only assumptions are that computer clocks are accurate to more
than 1% and that the ntpd daemon adjusts the local notion of wall time
based
...
on a reputable clock.
Review appreciated.
...
cheers,
Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com
wrote:
...
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu
wrote:
...
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
...
> On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de
wrote:
...
...
...
...
...
>
> Hi Bert,
>
> this was just a regression. There has always been this check in the
> past for
> Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
> If you would have used VM or OS startup time, this would still be
> problematic after an overflow. (Hence the comment about
snapshotting).
...
...
...
...
...
> So,
> this fix does not directly address the discussion about synching
> #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would
we
...
...
...
...
...
change that?
Eliot changed it recently. Probably to avoid the rollover issues. The
correct fix would be to use to UTC clock instead of the local one in
Time
...
...
...
...
class >> #millisecondClockValue.
I changed it for simplicity.  Alas it turns out to be a much more
complex
...
...
...
issue.  Here's a discussion I'm having with Ryan Macnak, which covers
what
...
...
...
his team did with the Dart VM.  Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com
wrote:
...
...
...
...
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <
eliot.miranda@gmail.com>
...
...
...
...
wrote:
...
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com
wrote:
>
> On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda
> eliot.miranda@gmail.com wrote:
Further back Ryan wrote:

>>>
>>> 5) Travis found an assertion failure. Unfortunately the assertions
>>> fail to include paths with the line numbers.
>>>
>>>
>>> (newUtcMicrosecondClock >= utcMicrosecondClock 124)
>>
>>
>> It's easy to track down.  Just grep for the string.  You'll find it
>> in sqUnixHeartbeat.c.  I've seen this from time to time, and have
yet to
...
...
...
...
...
>> understand it. What OS are you seeing this on?
>
>
> Linux. Looking at the comment above this assert, I see Cog is using
> the wrong clock. One should not rely on the realtime clock
(gettimeofday) to
...
...
...
...
...
> move steadily forward. It can jump around due to NTP syncs, the
machine
...
...
...
...
...
> sleeping or the user changing the time settings. Programs running
at startup
...
...
...
...
...
> on the Raspberry Pi in particular can see very large jumps because
it has no
...
...
...
...
...
> hardware clock (battery too expensive) so the first NTP sync will
be a very
...
...
...
...
...
> large correction. We fixed this in the Dart VM a few months ago.
Timers need
...
...
...
...
...
> to be scheduled using the monotonic clock (Linux clock_gettime, Mac
> mach_absolute_time).
Yes, this isn't satisfactory either.  One needs the VM to answer
something that is close to wall time, not drift over time.  I think
there
...
...
...
...
...
needs to be some clever averaging algorithm that has the property of
always
...
...
...
...
...
advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of
the
...
...
...
...
...
time it accesses both clock_gettime and gettimeofday and computes an
offset
...
...
...
...
...
that is some fraction of the delta between the current clock_gettime
and the
...
...
...
...
...
previous clock_gettime multiplied by the difference between the two
clocks.
...
...
...
...
...
So the VM time is always monotonic, but hunts towards wall time as
answered
...
...
...
...
...
by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time.  Given
these two it shouldn;t be too hard to concoct something that works.
Or is
...
...
...
...
...
that the approach you've taken in Dart?  Or are there standard
algorithms
...
...
...
...
...
out there?  I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs
make
...
...
...
...
both a wall clock and a monotonic clock available to the image.
...
...
...
That's one way, but it's complex.  I think having a clock that is
flexible, that will deviate by no more than a specified percentage from
clock_gettime in approaching wall time is simpler for the user albeit
more
...
...
...
complex for the VM implementor.  It therefore seems to me to be in the
Smalltalk tradition.
...
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the
monotonic clock.
Timer schedules a future notification (Delay wait). It uses the
monotonic clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
...
Smalltalk has the additional complication of handling in-flight Delays
or timeToRuns as an image moves across processes. There will be a
discontinuity in both clocks, and both of them can move backwards.
The logic
...
...
...
...
to deal with the discontinuity must already exist for Delays, though I
suspect no one has bothered for timeToRun. If I create a thousand
Delays
...
...
...
...
spaced apart by a minute, snapshot, move the system time forward a
day, then
...
...
...
...
resume, they remain evenly spaced.
This is because of #save/restoreResumptionTimes on image shutdown/startup.
...
...
...
...
If I do this while the image is still
running, they all fire at once and the VM becomes unresponsive, which
is
...
...
...
...
what using the monotonic clock would fix.
Yes, since Delays are currently using gettimeofday() they expire when
the system clock jumps.  But also, with the move of Delay to a
microsecond clock and removal of the clock-wrap checks, perhaps the
algorithm is more susceptible to jitter or ntp moving the clock
backwards.  I would need to think more about that, but anyway
clock_gettime(MONOTONIC) which slews wall-time seems a much better
choice.  I'd be interested in doing some work getting this into the VM
since I'd like it for Pharo also.
cheers -ben
...
...
...
Yes, but there is another way.  Delays can be implemented to function
as
...
...
...
durations, not deadlines.  This is orthogonal to clocks.  If Delays are
deadlines then it is correct that on start-up they all fire.  If they
are
...
...
...
durations, it is not.
-- 
_,,,^..^,,,_
best, Eliot