[squeak-dev] A UTC based implementation of DateAndTime

Chris Muller ma.chris.m at gmail.com
Tue May 27 02:30:39 UTC 2014


The issue actually relates purely to Squeak domain models.  Consider
the case of an all-in-memory object model in Squeak, with no database
involved at all.  It is very feasible an app would want to import a
flat-file dataset that involves creation a few million DateAndTime
instances (along with other objects, of course) to the point where
memory constraints begin to be noticed.

When dealing with this level of prolifigation-potential of a
particular class, and for such a base data-type we don't want to
endure changing again, I want us to strongly scrutinize the internal
representation.

In this case, the use of 'utcMicroseconds' introduces a lot of
duplicate bit-patterns in memory that are very hard, if not
impossible, to share.

The simplest case are two equivalent instances of DateAndTime (read
from separate files).  Despite being equivalent, their
utcMicroseconds' will be separate objects each consuming separate
memory space.  There is no easy way to share the same
'utcMicroseconds' instance between them.

But fully-equivalent DateAndTime's is not even half of the concern --
the high-order bits of every DateAndTime's 'utcMicroseconds'
duplicates the same bit pattern, again and again, eating up memory.

That doesn't happen when the internal representations are, or can be,
canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
original representation requires two additional slots per instance,
but the _contents_ of those slots are SmallIntegers -- shared memory.


On Mon, May 26, 2014 at 8:29 PM, David T. Lewis <lewis at mail.msen.com> wrote:
> On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
>> Hi Dave, as someone who works with large systems in Squeak, I'm always
>> interested in _storage efficiency_ as much as execution efficiency.
>>
>> DateAndTime, in particular, is a very common domain element with a
>> high potential for there to be many millions of instances in a given
>> domain model.
>>
>> Apps which have millions of objects with merely a Date attribute can
>> canonicalize them.
>> And, apps which have millions of Time objects can canonicalize them.
>>
>> But LargeInteger's are not easy to canonicalize (e.g.,
>> utcMicroseconds).  So a database system with millions of DateAndTime's
>> would have to do _two_ reads for every DateAndTime instance instead of
>> just one today (because SmallIntegers are immediate, while
>> LargeIntegers require their own storage buffer).
>
> Hi Chris,
>
> I do not have a lot of experience with database systems, so I would
> like to better understand the issue for storage of large numeric values.
>
> I was under the impression that modern SQL databases provide direct
> support for large integer data types (e.g. bigint for SQL server), and my
> assumption was that object databases such as Magma or GemStone would
> make this a non-issue. Why is it that a large (64 bit) integer should
> be any more or less difficult to persist than a small integer?
>
> This may be a dumb question but I am curious.
>
> Thanks,
> Dave
>


More information about the Squeak-dev mailing list