On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
Hi Dave, as someone who works with large systems in Squeak, I'm always interested in _storage efficiency_ as much as execution efficiency.
DateAndTime, in particular, is a very common domain element with a high potential for there to be many millions of instances in a given domain model.
Apps which have millions of objects with merely a Date attribute can canonicalize them. And, apps which have millions of Time objects can canonicalize them.
But LargeInteger's are not easy to canonicalize (e.g., utcMicroseconds). So a database system with millions of DateAndTime's would have to do _two_ reads for every DateAndTime instance instead of just one today (because SmallIntegers are immediate, while LargeIntegers require their own storage buffer).
One thing I really like about the current implementation of DateAndTime is how it carefully avoids LargeIntegers by having large-grained "platforms" to arrive at the current time. e.g., each 'jdn' is a chunk of (1000000*60*60*24) microseconds. Your new implementation reflects an increase of 86 BILLION utcMicroseconds for every 1 jdn.
Understood. But to clarify: The name "utcMicroseconds" reflects only the precision of the time scale, it is not meant to imply what kind of number is used to represent it. In fact, a DateAndTime with nanosecond precision will typically appear as a Fraction rather than a LargeInteger. But microsecond precision is what is currently reported by the primitives, so these are LargeInteger relative to the Posix epoch.
For saving to a database, you could certainly shift the time origin and/or limit the precision of the time representation. That's more or less with the current jnd/seconds/nanos does.
Small, all-in-memory benchmarks may show faster with the LI, but I'm concerned that large-scale apps might be significantly impacted in the opposite way..
Would it be possible to re-optimize this part of the representation while still maintaining internal UTC represenation to solve your concern about daylight-savings?
Sure, but just to clarify: This is not something that I am proposing for Squeak trunk. It is a follow up project to my TimeZoneDatabase that I have been meaning to do for the last 15 years. I finally got around to trying it, so I figured I'd go ahead and publish the code :)
Dave
Thanks.
On Sun, May 25, 2014 at 12:48 PM, David T. Lewis lewis@mail.msen.com wrote:
I have been working on a variation of class DateAndTime that replaces its instance variables (seconds offset jdn nanos) with two instance variables, utcMicroseconds to represent microseconds elapsed since the Posix epoch, and localOffsetSeconds to represent the local time zone offset. When instantiating the time now, A single call primitiveUtcWithOffset is used to obtain these two values atomically as reported by the underlying platform.
There are several advantages to this representation of DateAndTime, the most important of which is that its magnitude is unambiguous regardless of daylight savings transitions in local time zones.
This is my attempt to address some historical baggage in Squeak. The VM reports time related to the local time zone, and the image attempts to convert to UTC (sometimes incorrectly). A UTC based representation makes the implementation of time zone tables more straightforward (see for example the Olson time zone tables in TimeZoneDatabase on SqueakMap).
I am attaching the source code as a SAR file that can be loaded into a fully updated Squeak trunk image. The conversion process is slow, so be patient if you load it.
This can be run on either an intepreter VM or Cog, but if you use Cog, please use a version dated June 2013 or later (the VM in the Squeak 4.5 all-in-one is fine).
I am also attaching a copy of LXTestDateAndTimePerformance, which can be used to compare the performance of some basic DateAndTime functions.
Performance of the UTC based DateAndTime is generally favorable compared to the original. Here is what I see on my system (smaller numbers are better).
LXTestDateAndTimePerformance test results using the original Squeak DateAndTime on an interpreter VM: { #testNow->10143 . #testEquals->30986 . #testGreaterThan->80199 . #testLessThan->75912 . #testPrintString->10429 . #testStringAsDateAndTime->44657 }
LXTestDateAndTimePerformance test results using the new UTC based DateAndTime on an interpreter VM: { #testNow->6423 . #testEquals->31625 . #testGreaterThan->22999 . #testLessThan->18514 . #testPrintString->12502 . #testStringAsDateAndTime->32912 }
(CC to Brent Pinkney, author of the excellent Squeak Chronology package)
Dave