<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 18, 2016 at 11:02 AM, Ben Coman <span dir="ltr"><<a href="mailto:btc@openinworld.com" target="_blank">btc@openinworld.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi All,<br>
<br>
Did anything further come of this discussion...<br>
<div><div class="h5"><br>
On Wed, Feb 17, 2016 at 1:13 AM, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>> wrote:<br>
> Hi Tim,<br>
><br>
> On Tue, Feb 16, 2016 at 2:19 AM, Tim Felgentreff <<a href="mailto:timfelgentreff@gmail.com">timfelgentreff@gmail.com</a>><br>
> wrote:<br>
>><br>
>> Hi Eliot,<br>
>><br>
>> am I understanding correctly that (one of) your ideas is to have just one<br>
>> clock that "drifts" towards wall clock time? How fast would it drift? I<br>
>> agree this would be neat for a local image, but what if someone from Japan<br>
>> sends me their image (happened to me just last week) and wants me to check<br>
>> something in it? Will the in-image clock run slow for a few hours until my<br>
>> timezone has caught up with Japan?<br>
><br>
><br>
> First, the timezone issue is completely separate, and is only an image<br>
> issue. The VM's time basis is UTC microseconds since 1901 (see Time<br>
> class>>utcMicrosecondClock). This should answer the same value at the same<br>
> time, within the bounds of clock drift, any where on the globe. So if you<br>
> are sent an image from Japan nothing changes to times in that image when yu<br>
> start it locally, except for the inaccuracies between wall time (some atomic<br>
> clock somewhere) and your machine and the machine on which the image was<br>
> saved in Japan.<br>
><br>
> As a convenience the VM also offers microseconds since 1901 in local time<br>
> (see Time class>>localMicrosecondClock). This is suitable for deriving UI<br>
> times to display to the user, or times to write to a log, etc. But it is<br>
> /not/ to be used to schedule delays, etc, etc.<br>
><br>
> Second, yes, the idea is to have the VM drift time towards wall time. First<br>
> some back ground.<br>
><br>
> By wall time I mean the UTC time as provided by some atomic clock, i.e. an<br>
> extremely accurate absolute time as is accessed by a network time protocol<br>
> daemon (ntpd) to keep one's machine's clock accurate.<br>
> Current hardware provide inaccurate clocks. See e.g.<br>
> <a href="http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-the-time.html" rel="noreferrer" target="_blank">http://www.pcworld.com/article/2891892/why-computers-still-struggle-to-tell-the-time.html</a>.<br>
> One can expect such hardware to drift relative to wall time by less than a<br>
> minute a day, but still drift by ammounts noticeable to humans over the<br>
> course of hours.<br>
> OS's such as Mac OS and linux provide several clocks, two of which are of<br>
> interest. One is the computer's notion of wall time, typically provided by<br>
> gettimeofday. I'm going to call this "local time". This clock can run fast<br>
> or slow and can jump backwards. It does this because the underlying clock<br>
> drifts relative to wall time (and I guess may drift faster or slower<br>
> depending on temperature) and periodically is corrected by ntpd.<br>
> The second notion of time is monotonic time which is a local clock (the<br>
> hardware's underlying clock) which is guaranteed to advance monotonically<br>
> but is not guaranteed to agree with local time, let alone wall time. On<br>
> Unix this is provided by clock_gettime and on Mac OS X by mach_absolute_time<br>
> (see Ryan's message below). I'm going to call this "monotonic time".<br>
><br>
> The issue is that because local time jumps around it can't be used to<br>
> measure durations reliably. If you're profiling something using local time<br>
> and half-way through something the ntpd adjusts local time then one's<br>
> measurements will be off. One solution to this is to use monotonic time.<br>
> The problem is that this complicates the programmer model. It is now up to<br>
> the programmer to understand the difference between local and monotonic<br>
> times and use the "right clock" in the "right circumstances". This seems to<br>
> me to be against the Smalltalk approach, which is to provide a safe virtual<br>
> machine which protects the programmer from the vagaries and dangers of real<br>
> machines.<br>
<br>
</div></div>The flip side of this is it restricts the tools available to a<br>
programmer at the Image level, and adds complexity to the VM that<br>
Image programmers can't see if ThingsGoWrong(TM). If the Image had<br>
direct access to the several clocks provided by clock_gettime(), it<br>
would be possible for an Image level programmer to explore and<br>
*understand* the difference in them. I know I could write a routine<br>
in C to explore this, but I'd prefer to do it in Smalltalk.<br>
<br>
Indeed for Delay scheduler, it might be good to choose between<br>
CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW without modifying the VM, and<br>
observe what happens when I set the system time back a couple of days.<br>
Or maybe explore CLOCK_PROCESS_CPUTIME_ID for benchmarking.<br>
<span class=""><br>
> Further I think that providing a more ideal clock is<br>
> straight-forward. Here's what I propose.<br>
<br>
<br>
</span>But why do our own slewing to match local "real" time, when it seems<br>
NTP already slews CLOCK_MONOTONIC to wall-time using adjtime [1].<br></blockquote><div><br></div><div>What do we do on platforms that don't have adjtime? The skewing algorithm is simple and adding it to the VM means we're not dependent on adjtime. Isn't that a good thing?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
There seem to be a couple of choices to making clock_gettime() cross platform...<br>
a. <a href="https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb" rel="noreferrer" target="_blank">https://gist.github.com/alfwatt/3588c5aa1f7a1ef7a3bb</a><br>
b. <a href="https://github.com/ThomasHabets/monotonic_clock" rel="noreferrer" target="_blank">https://github.com/ThomasHabets/monotonic_clock</a><br>
<br>
[1] <a href="http://man7.org/linux/man-pages/man2/clock_gettime.2.html" rel="noreferrer" target="_blank">http://man7.org/linux/man-pages/man2/clock_gettime.2.html</a></blockquote><div><br></div><div>Sure. It's implementable. Why not add the simple algorithm to our VM and then we're done?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
> The Cog VMs use a heartbeat to control the rate at which the VM checks for<br>
> external input, delay expiry, etc. This is typically a 500Hz/2ms heartbeat,<br>
> but for the purposes of this discussion let's imagine its a 1Khz/1ms<br>
> heartbeat. One thing that happens every heartbeat is that the VM updates<br>
> its notion of local time. Currently the VM's notion of local time can jump<br>
> forwards or backwards because of ntpd activity.<br>
><br>
> Current computer clocks are accurate to a few seconds a day, but let's<br>
> assume a pessimal accuracy of 1 second an hour. That's an accuracy of<br>
> 1/3600, 0r 0.0277777%. If the heartbeat accesses both local time (via<br>
> gettimeofday) and monotonic time (via clock_gettime) on every heartbeat it<br>
> can compute a delta which it computes from the difference between local and<br>
> monotonic times. On each heartbeat it sums the delta to compute an offset<br>
> and applies this offset to monotonic time. The VM then answers offset<br>
> monotonic time as the value of "wall time". If we restrict the the delta on<br>
> each heartbeat we can keep the VMs clock monotonic and have it drift towards<br>
> local time, which itself is periodically corrected to wall time by ntpd.<br>
> If, for example, the delta is restricted to +/-5usecs when moving in one<br>
> direction and +/-10usecs when changing direction, the VM's notion of<br>
> monotonic time advances within 1% of actual monotonic time and approaches<br>
> local time "soon", since, because monotonic time is reasonably accurate a<br>
> rate of change of 1% soon overcomes a drift of at most 0.0277777%.<br>
><br>
> Let's express this as a Smalltalk class. Units are microseconds (usecs).<br>
><br>
> Object subclass: VMClock instanceVariableNames: 'offset now'<br>
><br>
> initialize<br>
> now := OperatingSystem gettimeofday.<br>
> offset := now - OperatingSystem clock_gettime<br>
><br>
> run<br>
> "Compute now every 1ms such that now approaches localTime."<br>
> [(Delay forMilliseconds: 1) wait.<br>
> self computeNowGivenMonotonic: OperatingSystem clock_gettime andLocal:<br>
> OperatingSystem gettimeofday.<br>
> true] whileTrue<br>
><br>
> computeNowGivenMonotonic: monotonicTime andLocal: localTime<br>
> "Compute now such that now approaches localTime."<br>
> | difference putativeNow delta |<br>
> "Note now = (monotonicTime + offset) and we want (monotonicTime + offset) =<br>
> localTime.<br>
> delta is the ammount to change offset by in this tick."<br>
> localTime = monotonicTime<br>
> ifTrue:<br>
> ["the two clocks agree; if an offset is in effect, reduce it by no more than<br>
> 5 usecs"<br>
> delta := offset >= 0<br>
> ifTrue: [(offset min: 5) negated]<br>
> ifFalse: [(offset max: -5) negated]]<br>
> ifFalse:<br>
> [putativeNow := monotonicTime + offset.<br>
> difference := localTime - putativeNow.<br>
> delta := 0.<br>
> difference < 0 ifTrue: "localTime is behind; make the offset more negative<br>
> by no more than 5 usecs"<br>
> [delta := difference max: (offset >= 0 ifTrue: [-10] ifFalse: [-5])].<br>
> difference > 0 ifTrue: "localTime is ahead; make the offset more positive by<br>
> no more than 5 usecs"<br>
> [delta := difference min: (offset >= 0 ifTrue: [5] ifFalse: [10])]].<br>
> offset := offset + delta.<br>
> now := monotonicTime + offset.<br>
> ^now<br>
><br>
> I've written and attached a little test program that randomly offsets<br>
> localTime every "3.6 seconds" in a simulation. Here's the simulation; the<br>
> drift is exaggerated:<br>
><br>
> run<br>
> "VMClockTester new run"<br>
> "VMClockTester new run last: 20"<br>
> "VMClockTester new run copyFrom: 36 * 3 - 5 to: 36 * 3 + 50"<br>
> | times |<br>
> times := (Array new: 1000) writeStream.<br>
> 1 to: 1000000 do: "1 million ticks = 1000 seconds"<br>
> [:i|<br>
> monotonicClock := monotonicClock + 1000.<br>
> localClock := localClock + 1000 + (i \\ 3600 = 0<br>
> ifTrue: [| drift |<br>
> [(drift := (drifter next - 0.5 * (20 * 3600)) rounded) = 0] whileTrue.<br>
> drift]<br>
> ifFalse: [0]).<br>
> self computeNowGivenMonotonic: monotonicClock andLocal: localClock.<br>
> i \\ 100 = 0 ifTrue:<br>
> [times nextPut: {now. offset. monotonicClock. localClock. monotonicClock -<br>
> localClock. now - localClock}]].<br>
> ^times contents<br>
><br>
> Here's what happens on the third perturbation. Before the perturbation<br>
> localTime is behind monotonicTime by 13ms, and the offset has stabilised to<br>
> -13ms. At the perturbation localTime jumps to 44ms behind monotonic time,<br>
> and over successive iterations the offset increases (negatively) and the<br>
> error is reduced from 30ms to 13ms.<br>
</div></div>[snip]<br>
<span class="">><br>
> Here's the last 20 entries. localTime is now 197ms ahead of monotonic time,<br>
> and the offset reduces from 221ms to 202ms, and the difference between now<br>
> and localTime reduces from 24ms to 5ms.<br>
</span>[snip]<br>
<div><div class="h5">><br>
> Note that a clock that is accurate to within 1% is more than accurate enough<br>
> for good time measurements. the VM's GC introduces pauses of around 1ms<br>
> even on fast hardware, and the occasional code zone reclamation introduces<br>
> similar pauses. So times can be affected by the odd millisecond anyway.<br>
><br>
> I'm sure the above is some standard piece of control theory but apart from<br>
> one open university program I'll never forget which began with attempts to<br>
> make a moving model tank target a moving object and ended with a sparrow on<br>
> a wind-blown branch keeping its head perfectly stationary, I've never<br>
> studied it. Anyone who knows what the above algorithm's known as please let<br>
> me know.<br>
><br>
>> Also, this sounds like a lot of logic to have in the VM and a source of<br>
>> possible confusion for the user - they would have to know to use the right<br>
>> invocations for measuring timespans, and not rely on a naive "let's get the<br>
>> time now and subtract it from the time later to get the difference", because<br>
>> they might end up with a timespan over "slow" or "fast" seconds.<br>
><br>
><br>
> No, that's exactly the point. With the above algorithm in use, (I believe)<br>
> time is never more than 1% inaccurate, but converges on wall time, and hence<br>
> the programmer can use one clock for both measurement and determining wall<br>
> time. The only assumptions are that computer clocks are accurate to more<br>
> than 1% and that the ntpd daemon adjusts the local notion of wall time based<br>
> on a reputable clock.<br>
><br>
><br>
> Review appreciated.<br>
><br>
>> cheers,<br>
>> Tim<br>
>><br>
>> On 16 February 2016 at 00:49, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>><br>
>> wrote:<br>
>>><br>
>>> Hi Levente, Hi Bert, Hi All,<br>
>>><br>
>>> On Mon, Feb 15, 2016 at 3:39 PM, Levente Uzonyi <<a href="mailto:leves@caesar.elte.hu">leves@caesar.elte.hu</a>><br>
>>> wrote:<br>
>>>><br>
>>>> On Mon, 15 Feb 2016, Bert Freudenberg wrote:<br>
>>>><br>
>>>>><br>
>>>>>> On 15.02.2016, at 10:17, marcel.taeumel <<a href="mailto:Marcel.Taeumel@hpi.de">Marcel.Taeumel@hpi.de</a>> wrote:<br>
>>>>>><br>
>>>>>> Hi Bert,<br>
>>>>>><br>
>>>>>> this was just a regression. There has always been this check in the<br>
>>>>>> past for<br>
>>>>>> Morphic projects and still today for MVC projects.<br>
>>>>><br>
>>>>><br>
>>>>> Ah, so we lost the check at some point?<br>
>>>>><br>
>>>>>> If you would have used VM or OS startup time, this would still be<br>
>>>>>> problematic after an overflow. (Hence the comment about snapshotting).<br>
>>>>>> So,<br>
>>>>>> this fix does not directly address the discussion about synching<br>
>>>>>> #millisecondClockValue to wall clock.<br>
>>>>><br>
>>>>><br>
>>>>> I still think it should answer milliseconds since startup. Why would we<br>
>>>>> change that?<br>
>>>><br>
>>>><br>
>>>> Eliot changed it recently. Probably to avoid the rollover issues. The<br>
>>>> correct fix would be to use to UTC clock instead of the local one in Time<br>
>>>> class >> #millisecondClockValue.<br>
>>><br>
>>><br>
>>> I changed it for simplicity. Alas it turns out to be a much more complex<br>
>>> issue. Here's a discussion I'm having with Ryan Macnak, which covers what<br>
>>> his team did with the Dart VM. Please read, it's interesting.<br>
>>><br>
>>><br>
>>> On Sun, Feb 14, 2016 at 12:08 AM, Ryan Macnak <<a href="mailto:rmacnak@gmail.com">rmacnak@gmail.com</a>> wrote:<br>
>>>><br>
>>>> On Sat, Feb 13, 2016 at 5:32 PM, Eliot Miranda <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>><br>
>>>> wrote:<br>
>>>>><br>
>>>>> Hi Ryan,<br>
>>>>><br>
>>>>><br>
>>>>> On Sat, Feb 13, 2016 at 11:21 AM, Ryan Macnak <<a href="mailto:rmacnak@gmail.com">rmacnak@gmail.com</a>><br>
>>>>> wrote:<br>
>>>>>><br>
>>>>>> On Thu, Feb 11, 2016 at 10:46 PM, Eliot Miranda<br>
>>>>>> <<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>> wrote:<br>
>>>>><br>
>>>>> Further back Ryan wrote:<br>
>>>>>>>><br>
>>>>>>>> 5) Travis found an assertion failure. Unfortunately the assertions<br>
>>>>>>>> fail to include paths with the line numbers.<br>
>>>>>>>><br>
>>>>>>>><br>
>>>>>>>> (newUtcMicrosecondClock >= utcMicrosecondClock 124)<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> It's easy to track down. Just grep for the string. You'll find it<br>
>>>>>>> in sqUnixHeartbeat.c. I've seen this from time to time, and have yet to<br>
>>>>>>> understand it. What OS are you seeing this on?<br>
>>>>>><br>
>>>>>><br>
>>>>>> Linux. Looking at the comment above this assert, I see Cog is using<br>
>>>>>> the wrong clock. One should not rely on the realtime clock (gettimeofday) to<br>
>>>>>> move steadily forward. It can jump around due to NTP syncs, the machine<br>
>>>>>> sleeping or the user changing the time settings. Programs running at startup<br>
>>>>>> on the Raspberry Pi in particular can see very large jumps because it has no<br>
>>>>>> hardware clock (battery too expensive) so the first NTP sync will be a very<br>
>>>>>> large correction. We fixed this in the Dart VM a few months ago. Timers need<br>
>>>>>> to be scheduled using the monotonic clock (Linux clock_gettime, Mac<br>
>>>>>> mach_absolute_time).<br>
>>>>><br>
>>>>><br>
>>>>> Yes, this isn't satisfactory either. One needs the VM to answer<br>
>>>>> something that is close to wall time, not drift over time. I think there<br>
>>>>> needs to be some clever averaging algorithm that has the property of always<br>
>>>>> advancing the clock but trying to converge on wall time.<br>
>>>>><br>
>>>>><br>
>>>>> One can imagine on every occasion that the VM updates its notion of the<br>
>>>>> time it accesses both clock_gettime and gettimeofday and computes an offset<br>
>>>>> that is some fraction of the delta between the current clock_gettime and the<br>
>>>>> previous clock_gettime multiplied by the difference between the two clocks.<br>
>>>>> So the VM time is always monotonic, but hunts towards wall time as answered<br>
>>>>> by gettimeofday.<br>
>>>>><br>
>>>>><br>
>>>>> Thanks. I was unaware of clock_gettime & mach_absolute_time. Given<br>
>>>>> these two it shouldn;t be too hard to concoct something that works. Or is<br>
>>>>> that the approach you've taken in Dart? Or are there standard algorithms<br>
>>>>> out there? I'll take a look.<br>
>>>><br>
>>>><br>
>>>> I'm not seeing why it needs to be close to wall time. The VM needs make<br>
>>>> both a wall clock and a monotonic clock available to the image.<br>
<br>
<br>
<br>
>>><br>
>>><br>
>>> That's one way, but it's complex. I think having a clock that is<br>
>>> flexible, that will deviate by no more than a specified percentage from<br>
>>> clock_gettime in approaching wall time is simpler for the user albeit more<br>
>>> complex for the VM implementor. It therefore seems to me to be in the<br>
>>> Smalltalk tradition.<br>
>>><br>
>>>> In Dart, there are three uses of time<br>
>>>><br>
>>>> Stopwatch measures durations (BlockClosure timeToRun). It uses the<br>
>>>> monotonic clock.<br>
>>>><br>
>>>> Timer schedules a future notification (Delay wait). It uses the<br>
>>>> monotonic clock.<br>
>>>><br>
>>>> DateTime gets a timestamp (DateAndTime now). It uses the wall clock.<br>
>>><br>
>>><br>
>>> Makes sense, at the cost of having two clocks.<br>
>>><br>
>>>><br>
>>>> Smalltalk has the additional complication of handling in-flight Delays<br>
>>>> or timeToRuns as an image moves across processes. There will be a<br>
>>>> discontinuity in both clocks, and both of them can move backwards. The logic<br>
>>>> to deal with the discontinuity must already exist for Delays, though I<br>
>>>> suspect no one has bothered for timeToRun. If I create a thousand Delays<br>
>>>> spaced apart by a minute, snapshot, move the system time forward a day, then<br>
>>>> resume, they remain evenly spaced.<br>
<br>
</div></div>This is because of #save/restoreResumptionTimes on image shutdown/startup.<br>
<span class=""><br>
>>>> If I do this while the image is still<br>
>>>> running, they all fire at once and the VM becomes unresponsive, which is<br>
>>>> what using the monotonic clock would fix.<br>
<br>
</span>Yes, since Delays are currently using gettimeofday() they expire when<br>
the system clock jumps. But also, with the move of Delay to a<br>
microsecond clock and removal of the clock-wrap checks, perhaps the<br>
algorithm is more susceptible to jitter or ntp moving the clock<br>
backwards. I would need to think more about that, but anyway<br>
clock_gettime(MONOTONIC) which slews wall-time seems a much better<br>
choice. I'd be interested in doing some work getting this into the VM<br>
since I'd like it for Pharo also.<br>
<br>
cheers -ben<br>
<div class="HOEnZb"><div class="h5"><br>
>>><br>
>>><br>
>>> Yes, but there is another way. Delays can be implemented to function as<br>
>>> durations, not deadlines. This is orthogonal to clocks. If Delays are<br>
>>> deadlines then it is correct that on start-up they all fire. If they are<br>
>>> durations, it is not.<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>
</div></div>