On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman btc@openinworld.com wrote:
Hi All,
Did anything further come of this discussion...
On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Tim,
On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <
timfelgentreff@gmail.com>
wrote:
Hi Eliot,
am I understanding correctly that (one of) your ideas is to have just
one
clock that "drifts" towards wall clock time? How fast would it drift? I agree this would be neat for a local image, but what if someone from
Japan
sends me their image (happened to me just last week) and wants me to
check
something in it? Will the in-image clock run slow for a few hours until
my
timezone has caught up with Japan?
First, the timezone issue is completely separate, and is only an image issue. The VM's time basis is UTC microseconds since 1901 (see Time class>>utcMicrosecondClock). This should answer the same value at the
same
time, within the bounds of clock drift, any where on the globe. So if
you
are sent an image from Japan nothing changes to times in that image when
yu
start it locally, except for the inaccuracies between wall time (some
atomic
clock somewhere) and your machine and the machine on which the image was saved in Japan.
As a convenience the VM also offers microseconds since 1901 in local time (see Time class>>localMicrosecondClock). This is suitable for deriving
UI
times to display to the user, or times to write to a log, etc. But it is /not/ to be used to schedule delays, etc, etc.
Second, yes, the idea is to have the VM drift time towards wall time.
First
some back ground.
By wall time I mean the UTC time as provided by some atomic clock, i.e.
an
extremely accurate absolute time as is accessed by a network time
protocol
daemon (ntpd) to keep one's machine's clock accurate. Current hardware provide inaccurate clocks. See e.g.
http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-... .
One can expect such hardware to drift relative to wall time by less than
a
minute a day, but still drift by ammounts noticeable to humans over the course of hours. OS's such as Mac OS and linux provide several clocks, two of which are of interest. One is the computer's notion of wall time, typically provided
by
gettimeofday. I'm going to call this "local time". This clock can run
fast
or slow and can jump backwards. It does this because the underlying
clock
drifts relative to wall time (and I guess may drift faster or slower depending on temperature) and periodically is corrected by ntpd. The second notion of time is monotonic time which is a local clock (the hardware's underlying clock) which is guaranteed to advance monotonically but is not guaranteed to agree with local time, let alone wall time. On Unix this is provided by clock_gettime and on Mac OS X by
mach_absolute_time
(see Ryan's message below). I'm going to call this "monotonic time".
The issue is that because local time jumps around it can't be used to measure durations reliably. If you're profiling something using local
time
and half-way through something the ntpd adjusts local time then one's measurements will be off. One solution to this is to use monotonic time. The problem is that this complicates the programmer model. It is now up
to
the programmer to understand the difference between local and monotonic times and use the "right clock" in the "right circumstances". This
seems to
me to be against the Smalltalk approach, which is to provide a safe
virtual
machine which protects the programmer from the vagaries and dangers of
real
machines.
The flip side of this is it restricts the tools available to a programmer at the Image level, and adds complexity to the VM that Image programmers can't see if ThingsGoWrong(TM). If the Image had direct access to the several clocks provided by clock_gettime(), it would be possible for an Image level programmer to explore and *understand* the difference in them. I know I could write a routine in C to explore this, but I'd prefer to do it in Smalltalk.
Indeed for Delay scheduler, it might be good to choose between CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and observe what happens when I set the system time back a couple of days. Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.
Further I think that providing a more ideal clock is straight-forward. Here's what I propose.
But why do our own slewing to match local "real" time, when it seems NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].
What do we do on platforms that don't have adjtime? The skewing algorithm is simple and adding it to the VM means we're not dependent on adjtime. Isn't that a good thing?
There seem to be a couple of choices to making clock_gettime() cross platform... a. https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb b. https://github.com/ThomasHabets/monotonic_clock
[1] http://man7.org/linux/man-pages/man2/clock_gettime.2.html
Sure. It's implementable. Why not add the simple algorithm to our VM and then we're done?
The Cog VMs use a heartbeat to control the rate at which the VM checks for
external input, delay expiry, etc. This is typically a 500Hz/2ms
heartbeat,
but for the purposes of this discussion let's imagine its a 1Khz/1ms heartbeat. One thing that happens every heartbeat is that the VM updates its notion of local time. Currently the VM's notion of local time can
jump
forwards or backwards because of ntpd activity.
Current computer clocks are accurate to a few seconds a day, but let's assume a pessimal accuracy of 1 second an hour. That's an accuracy of 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via gettimeofday) and monotonic time (via clock_gettime) on every heartbeat
it
can compute a delta which it computes from the difference between local
and
monotonic times. On each heartbeat it sums the delta to compute an
offset
and applies this offset to monotonic time. The VM then answers offset monotonic time as the value of "wall time". If we restrict the the
delta on
each heartbeat we can keep the VMs clock monotonic and have it drift
towards
local time, which itself is periodically corrected to wall time by ntpd. If, for example, the delta is restricted to +/-5usecs when moving in one direction and +/-10usecs when changing direction, the VM's notion of monotonic time advances within 1% of actual monotonic time and approaches local time "soon", since, because monotonic time is reasonably accurate a rate of change of 1% soon overcomes a drift of at most 0.0277777%.
Let's express this as a Smalltalk class. Units are microseconds (usecs).
Object subclass: VMClock instanceVariableNames: 'offset now'
initialize now := OperatingSystem gettimeofday. offset := now - OperatingSystem clock_gettime
run "Compute now every 1ms such that now approaches localTime." [(Delay forMilliseconds: 1) wait. self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal: OperatingSystem gettimeofday. true] whileTrue
computeNowGivenMonotonic: monotonicTime andLocal: localTime "Compute now such that now approaches localTime." | difference putativeNow delta | "Note now = (monotonicTime + offset) and we want (monotonicTime +
offset) =
localTime. delta is the ammount to change offset by in this tick." localTime = monotonicTime ifTrue: ["the two clocks agree; if an offset is in effect, reduce it by no more
than
5 usecs" delta := offset >= 0 ifTrue: [(offset min: 5) negated] ifFalse: [(offset max: -5) negated]] ifFalse: [putativeNow := monotonicTime + offset. difference := localTime - putativeNow. delta := 0. difference < 0 ifTrue: "localTime is behind; make the offset more
negative
by no more than 5 usecs" [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])]. difference > 0 ifTrue: "localTime is ahead; make the offset more
positive by
no more than 5 usecs" [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]]. offset := offset + delta. now := monotonicTime + offset. ^now
I've written and attached a little test program that randomly offsets localTime every "3.6 seconds" in a simulation. Here's the simulation;
the
drift is exaggerated:
run "VMClockTester new run" "VMClockTester new run last: 20" "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50" | times | times := (Array new: 1000) writeStream. 1 to: 1000000 do: "1 million ticks = 1000 seconds" [:i| monotonicClock := monotonicClock + 1000. localClock := localClock + 1000 + (i \ 3600 = 0 ifTrue: [| drift | [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue. drift] ifFalse: [0]). self computeNowGivenMonotonic: monotonicClock andLocal: localClock. i \ 100 = 0 ifTrue: [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock
localClock. now - localClock}]]. ^times contents
Here's what happens on the third perturbation. Before the perturbation localTime is behind monotonicTime by 13ms, and the offset has stabilised
to
-13ms. At the perturbation localTime jumps to 44ms behind monotonic
time,
and over successive iterations the offset increases (negatively) and the error is reduced from 30ms to 13ms.
[snip]
Here's the last 20 entries. localTime is now 197ms ahead of monotonic
time,
and the offset reduces from 221ms to 202ms, and the difference between
now
and localTime reduces from 24ms to 5ms.
[snip]
Note that a clock that is accurate to within 1% is more than accurate
enough
for good time measurements. the VM's GC introduces pauses of around 1ms even on fast hardware, and the occasional code zone reclamation
introduces
similar pauses. So times can be affected by the odd millisecond anyway.
I'm sure the above is some standard piece of control theory but apart
from
one open university program I'll never forget which began with attempts
to
make a moving model tank target a moving object and ended with a sparrow
on
a wind-blown branch keeping its head perfectly stationary, I've never studied it. Anyone who knows what the above algorithm's known as please
let
me know.
Also, this sounds like a lot of logic to have in the VM and a source of possible confusion for the user - they would have to know to use the
right
invocations for measuring timespans, and not rely on a naive "let's get
the
time now and subtract it from the time later to get the difference",
because
they might end up with a timespan over "slow" or "fast" seconds.
No, that's exactly the point. With the above algorithm in use, (I
believe)
time is never more than 1% inaccurate, but converges on wall time, and
hence
the programmer can use one clock for both measurement and determining
wall
time. The only assumptions are that computer clocks are accurate to more than 1% and that the ntpd daemon adjusts the local notion of wall time
based
on a reputable clock.
Review appreciated.
cheers, Tim
On 16 February 2016 at 00:49, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Levente, Hi Bert, Hi All,
On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
On Mon, 15 Feb 2016, Bert Freudenberg wrote:
> On 15.02.2016, at 10:17, marcel.taeumel Marcel.Taeumel@hpi.de
wrote:
> > Hi Bert, > > this was just a regression. There has always been this check in the > past for > Morphic projects and still today for MVC projects.
Ah, so we lost the check at some point?
> If you would have used VM or OS startup time, this would still be > problematic after an overflow. (Hence the comment about
snapshotting).
> So, > this fix does not directly address the discussion about synching > #millisecondClockValue to wall clock.
I still think it should answer milliseconds since startup. Why would
we
change that?
Eliot changed it recently. Probably to avoid the rollover issues. The correct fix would be to use to UTC clock instead of the local one in
Time
class >> #millisecondClockValue.
I changed it for simplicity. Alas it turns out to be a much more
complex
issue. Here's a discussion I'm having with Ryan Macnak, which covers
what
his team did with the Dart VM. Please read, it's interesting.
On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak rmacnak@gmail.com
wrote:
On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <
eliot.miranda@gmail.com>
wrote:
Hi Ryan,
On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak rmacnak@gmail.com wrote: > > On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda > eliot.miranda@gmail.com wrote:
Further back Ryan wrote:
>>> >>> 5) Travis found an assertion failure. Unfortunately the assertions >>> fail to include paths with the line numbers. >>> >>> >>> (newUtcMicrosecondClock >= utcMicrosecondClock 124) >> >> >> It's easy to track down. Just grep for the string. You'll find it >> in sqUnixHeartbeat.c. I've seen this from time to time, and have
yet to
>> understand it. What OS are you seeing this on? > > > Linux. Looking at the comment above this assert, I see Cog is using > the wrong clock. One should not rely on the realtime clock
(gettimeofday) to
> move steadily forward. It can jump around due to NTP syncs, the
machine
> sleeping or the user changing the time settings. Programs running
at startup
> on the Raspberry Pi in particular can see very large jumps because
it has no
> hardware clock (battery too expensive) so the first NTP sync will
be a very
> large correction. We fixed this in the Dart VM a few months ago.
Timers need
> to be scheduled using the monotonic clock (Linux clock_gettime, Mac > mach_absolute_time).
Yes, this isn't satisfactory either. One needs the VM to answer something that is close to wall time, not drift over time. I think
there
needs to be some clever averaging algorithm that has the property of
always
advancing the clock but trying to converge on wall time.
One can imagine on every occasion that the VM updates its notion of
the
time it accesses both clock_gettime and gettimeofday and computes an
offset
that is some fraction of the delta between the current clock_gettime
and the
previous clock_gettime multiplied by the difference between the two
clocks.
So the VM time is always monotonic, but hunts towards wall time as
answered
by gettimeofday.
Thanks. I was unaware of clock_gettime & mach_absolute_time. Given these two it shouldn;t be too hard to concoct something that works.
Or is
that the approach you've taken in Dart? Or are there standard
algorithms
out there? I'll take a look.
I'm not seeing why it needs to be close to wall time. The VM needs
make
both a wall clock and a monotonic clock available to the image.
That's one way, but it's complex. I think having a clock that is flexible, that will deviate by no more than a specified percentage from clock_gettime in approaching wall time is simpler for the user albeit
more
complex for the VM implementor. It therefore seems to me to be in the Smalltalk tradition.
In Dart, there are three uses of time
Stopwatch measures durations (BlockClosure timeToRun). It uses the monotonic clock.
Timer schedules a future notification (Delay wait). It uses the monotonic clock.
DateTime gets a timestamp (DateAndTime now). It uses the wall clock.
Makes sense, at the cost of having two clocks.
Smalltalk has the additional complication of handling in-flight Delays or timeToRuns as an image moves across processes. There will be a discontinuity in both clocks, and both of them can move backwards.
The logic
to deal with the discontinuity must already exist for Delays, though I suspect no one has bothered for timeToRun. If I create a thousand
Delays
spaced apart by a minute, snapshot, move the system time forward a
day, then
resume, they remain evenly spaced.
This is because of #save/restoreResumptionTimes on image shutdown/startup.
If I do this while the image is still running, they all fire at once and the VM becomes unresponsive, which
is
what using the monotonic clock would fix.
Yes, since Delays are currently using gettimeofday() they expire when the system clock jumps. But also, with the move of Delay to a microsecond clock and removal of the clock-wrap checks, perhaps the algorithm is more susceptible to jitter or ntp moving the clock backwards. I would need to think more about that, but anyway clock_gettime(MONOTONIC) which slews wall-time seems a much better choice. I'd be interested in doing some work getting this into the VM since I'd like it for Pharo also.
cheers -ben
Yes, but there is another way. Delays can be implemented to function
as
durations, not deadlines. This is orthogonal to clocks. If Delays are deadlines then it is correct that on start-up they all fire. If they
are
durations, it is not.