2014-05-26 20:09 GMT+02:00 Chris Muller asqueaker@gmail.com:
Hi Dave, as someone who works with large systems in Squeak, I'm always interested in _storage efficiency_ as much as execution efficiency.
DateAndTime, in particular, is a very common domain element with a high potential for there to be many millions of instances in a given domain model.
Apps which have millions of objects with merely a Date attribute can canonicalize them. And, apps which have millions of Time objects can canonicalize them.
But LargeInteger's are not easy to canonicalize (e.g., utcMicroseconds). So a database system with millions of DateAndTime's would have to do _two_ reads for every DateAndTime instance instead of just one today (because SmallIntegers are immediate, while LargeIntegers require their own storage buffer).
One thing I really like about the current implementation of DateAndTime is how it carefully avoids LargeIntegers by having large-grained "platforms" to arrive at the current time. e.g., each 'jdn' is a chunk of (1000000*60*60*24) microseconds. Your new implementation reflects an increase of 86 BILLION utcMicroseconds for every 1 jdn.
Small, all-in-memory benchmarks may show faster with the LI, but I'm concerned that large-scale apps might be significantly impacted in the opposite way..
Would it be possible to re-optimize this part of the representation while still maintaining internal UTC represenation to solve your concern about daylight-savings?
Thanks.
That's more or less the Pharo path.
On Sun, May 25, 2014 at 12:48 PM, David T. Lewis lewis@mail.msen.com wrote:
I have been working on a variation of class DateAndTime that replaces its instance variables (seconds offset jdn nanos) with two instance
variables,
utcMicroseconds to represent microseconds elapsed since the Posix epoch,
and
localOffsetSeconds to represent the local time zone offset. When
instantiating
the time now, A single call primitiveUtcWithOffset is used to obtain
these
two values atomically as reported by the underlying platform.
There are several advantages to this representation of DateAndTime, the
most
important of which is that its magnitude is unambiguous regardless of
daylight
savings transitions in local time zones.
This is my attempt to address some historical baggage in Squeak. The VM reports time related to the local time zone, and the image attempts to convert to UTC (sometimes incorrectly). A UTC based representation makes
the
implementation of time zone tables more straightforward (see for example the Olson time zone tables in TimeZoneDatabase on SqueakMap).
I am attaching the source code as a SAR file that can be loaded into a
fully
updated Squeak trunk image. The conversion process is slow, so be patient if you load it.
This can be run on either an intepreter VM or Cog, but if you use Cog,
please
use a version dated June 2013 or later (the VM in the Squeak 4.5
all-in-one
is fine).
I am also attaching a copy of LXTestDateAndTimePerformance, which can be used to compare the performance of some basic DateAndTime functions.
Performance of the UTC based DateAndTime is generally favorable compared
to
the original. Here is what I see on my system (smaller numbers are
better).
LXTestDateAndTimePerformance test results using the original Squeak
DateAndTime
on an interpreter VM: { #testNow->10143 . #testEquals->30986 . #testGreaterThan->80199 . #testLessThan->75912 . #testPrintString->10429 . #testStringAsDateAndTime->44657 }
LXTestDateAndTimePerformance test results using the new UTC based
DateAndTime
on an interpreter VM: { #testNow->6423 . #testEquals->31625 . #testGreaterThan->22999 . #testLessThan->18514 . #testPrintString->12502 . #testStringAsDateAndTime->32912 }
(CC to Brent Pinkney, author of the excellent Squeak Chronology package)
Dave