[squeak-dev] A UTC based implementation of DateAndTime

Tue May 27 19:55:33 UTC 2014

2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:

> The issue actually relates purely to Squeak domain models.  Consider
> the case of an all-in-memory object model in Squeak, with no database
> involved at all.  It is very feasible an app would want to import a
> flat-file dataset that involves creation a few million DateAndTime
> instances (along with other objects, of course) to the point where
> memory constraints begin to be noticed.
>
> When dealing with this level of prolifigation-potential of a
> particular class, and for such a base data-type we don't want to
> endure changing again, I want us to strongly scrutinize the internal
> representation.
>
> In this case, the use of 'utcMicroseconds' introduces a lot of
> duplicate bit-patterns in memory that are very hard, if not
> impossible, to share.
>
> The simplest case are two equivalent instances of DateAndTime (read
> from separate files).  Despite being equivalent, their
> utcMicroseconds' will be separate objects each consuming separate
> memory space.  There is no easy way to share the same
> 'utcMicroseconds' instance between them.
>
> But fully-equivalent DateAndTime's is not even half of the concern --
> the high-order bits of every DateAndTime's 'utcMicroseconds'
> duplicates the same bit pattern, again and again, eating up memory.
>
> That doesn't happen when the internal representations are, or can be,
> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
> original representation requires two additional slots per instance,
> but the _contents_ of those slots are SmallIntegers -- shared memory.
>
>
Well, in current 32 bit image format, SmallInteger are not exactly shared,
they are immediate values.
Each consumes exactly 32 bits.

For a compact class like LargePosOrNegInteger, I don't remember what is the
header size exactly, but you get 64 bits for data, I would be surprised to
see a major difference wrt consumed memory.

Nicolas

> On Mon, May 26, 2014 at 8:29 PM, David T. Lewis <lewis at mail.msen.com>
> wrote:
> > On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
> >> Hi Dave, as someone who works with large systems in Squeak, I'm always
> >> interested in _storage efficiency_ as much as execution efficiency.
> >>
> >> DateAndTime, in particular, is a very common domain element with a
> >> high potential for there to be many millions of instances in a given
> >> domain model.
> >>
> >> Apps which have millions of objects with merely a Date attribute can
> >> canonicalize them.
> >> And, apps which have millions of Time objects can canonicalize them.
> >>
> >> But LargeInteger's are not easy to canonicalize (e.g.,
> >> utcMicroseconds).  So a database system with millions of DateAndTime's
> >> would have to do _two_ reads for every DateAndTime instance instead of
> >> just one today (because SmallIntegers are immediate, while
> >> LargeIntegers require their own storage buffer).
> >
> > Hi Chris,
> >
> > I do not have a lot of experience with database systems, so I would
> > like to better understand the issue for storage of large numeric values.
> >
> > I was under the impression that modern SQL databases provide direct
> > support for large integer data types (e.g. bigint for SQL server), and my
> > assumption was that object databases such as Magma or GemStone would
> > make this a non-issue. Why is it that a large (64 bit) integer should
> > be any more or less difficult to persist than a small integer?
> >
> > This may be a dumb question but I am curious.
> >
> > Thanks,
> > Dave
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20140527/6a3e8991/attachment.htm