[squeak-dev] A UTC based implementation of DateAndTime

David T. Lewis lewis at mail.msen.com
Wed May 28 00:04:58 UTC 2014


On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
> 2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:
> 
> > The issue actually relates purely to Squeak domain models.  Consider
> > the case of an all-in-memory object model in Squeak, with no database
> > involved at all.  It is very feasible an app would want to import a
> > flat-file dataset that involves creation a few million DateAndTime
> > instances (along with other objects, of course) to the point where
> > memory constraints begin to be noticed.
> >
> > When dealing with this level of prolifigation-potential of a
> > particular class, and for such a base data-type we don't want to
> > endure changing again, I want us to strongly scrutinize the internal
> > representation.
> >
> > In this case, the use of 'utcMicroseconds' introduces a lot of
> > duplicate bit-patterns in memory that are very hard, if not
> > impossible, to share.
> >
> > The simplest case are two equivalent instances of DateAndTime (read
> > from separate files).  Despite being equivalent, their
> > utcMicroseconds' will be separate objects each consuming separate
> > memory space.  There is no easy way to share the same
> > 'utcMicroseconds' instance between them.
> >
> > But fully-equivalent DateAndTime's is not even half of the concern --
> > the high-order bits of every DateAndTime's 'utcMicroseconds'
> > duplicates the same bit pattern, again and again, eating up memory.
> >
> > That doesn't happen when the internal representations are, or can be,
> > canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
> > original representation requires two additional slots per instance,
> > but the _contents_ of those slots are SmallIntegers -- shared memory.
> >
> >
> Well, in current 32 bit image format, SmallInteger are not exactly shared,
> they are immediate values.
> Each consumes exactly 32 bits.
> 
> For a compact class like LargePosOrNegInteger, I don't remember what is the
> header size exactly, but you get 64 bits for data, I would be surprised to
> see a major difference wrt consumed memory.
> 

Smalltalk compactClassesArray includes: DateAndTime ==> false
Smalltalk compactClassesArray includes: LargePositiveInteger ==> true

So for the traditional DateAndTime implementation, an instance requires:

  2 words of header (64 bits)
  3 words for the small integer jdn/seconds/nanos variables
  1 word for the pointer to the offset object, which is an instance of Duration

In practice, most instances of DateAndTime within an image will share the
same offset object, so for purposes of estimation assume that this takes
no extra space.

Thus each instance requires 6 words of space in the object memory (maybe a bit
more on average if the DateAndTime instances are not sharing the same Duration
instance for one reason or another).

For the UTC based implementation of DateAndTime, each instance requires:

  2 words of header
  1 word for the small integer localOffsetSeconds variable
  1 word for the pointer to the LargePositiveInteger representing utcMicroSeconds
  1 word of header for the large positive integer
  2 words of data for the value of the large positive integer

Thus each instance requires 7 words of space in the object memory.

So there is a difference, but it would probably not be a large effect on
overall space utilization, even assuming complete sharing of the offset
Duration instances.

Dave



More information about the Squeak-dev mailing list