[squeak-dev] A UTC based implementation of DateAndTime

David T. Lewis lewis at mail.msen.com
Mon Jun 2 23:41:42 UTC 2014


On Mon, Jun 02, 2014 at 09:21:06PM +0200, Levente Uzonyi wrote:
> On Tue, 27 May 2014, David T. Lewis wrote:
> 
> >On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
> >>2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:
> >>
> >>>The issue actually relates purely to Squeak domain models.  Consider
> >>>the case of an all-in-memory object model in Squeak, with no database
> >>>involved at all.  It is very feasible an app would want to import a
> >>>flat-file dataset that involves creation a few million DateAndTime
> >>>instances (along with other objects, of course) to the point where
> >>>memory constraints begin to be noticed.
> >>>
> >>>When dealing with this level of prolifigation-potential of a
> >>>particular class, and for such a base data-type we don't want to
> >>>endure changing again, I want us to strongly scrutinize the internal
> >>>representation.
> >>>
> >>>In this case, the use of 'utcMicroseconds' introduces a lot of
> >>>duplicate bit-patterns in memory that are very hard, if not
> >>>impossible, to share.
> >>>
> >>>The simplest case are two equivalent instances of DateAndTime (read
> >>>from separate files).  Despite being equivalent, their
> >>>utcMicroseconds' will be separate objects each consuming separate
> >>>memory space.  There is no easy way to share the same
> >>>'utcMicroseconds' instance between them.
> >>>
> >>>But fully-equivalent DateAndTime's is not even half of the concern --
> >>>the high-order bits of every DateAndTime's 'utcMicroseconds'
> >>>duplicates the same bit pattern, again and again, eating up memory.
> >>>
> >>>That doesn't happen when the internal representations are, or can be,
> >>>canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
> >>>original representation requires two additional slots per instance,
> >>>but the _contents_ of those slots are SmallIntegers -- shared memory.
> >>>
> >>>
> >>Well, in current 32 bit image format, SmallInteger are not exactly shared,
> >>they are immediate values.
> >>Each consumes exactly 32 bits.
> >>
> >>For a compact class like LargePosOrNegInteger, I don't remember what is 
> >>the
> >>header size exactly, but you get 64 bits for data, I would be surprised to
> >>see a major difference wrt consumed memory.
> >>
> >
> >Smalltalk compactClassesArray includes: DateAndTime ==> false
> >Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
> >
> >So for the traditional DateAndTime implementation, an instance requires:
> >
> > 2 words of header (64 bits)
> > 3 words for the small integer jdn/seconds/nanos variables
> > 1 word for the pointer to the offset object, which is an instance of 
> > Duration
> >
> >In practice, most instances of DateAndTime within an image will share the
> >same offset object, so for purposes of estimation assume that this takes
> >no extra space.
> >
> >Thus each instance requires 6 words of space in the object memory (maybe a 
> >bit
> >more on average if the DateAndTime instances are not sharing the same 
> >Duration
> >instance for one reason or another).
> >
> >For the UTC based implementation of DateAndTime, each instance requires:
> >
> > 2 words of header
> > 1 word for the small integer localOffsetSeconds variable
> > 1 word for the pointer to the LargePositiveInteger representing 
> > utcMicroSeconds
> > 1 word of header for the large positive integer
> > 2 words of data for the value of the large positive integer
> >
> >Thus each instance requires 7 words of space in the object memory.
> >
> >So there is a difference, but it would probably not be a large effect on
> >overall space utilization, even assuming complete sharing of the offset
> >Duration instances.
> 
> I think it's possible to reduce the number of words to 5 at the cost of 
> reusing integer primitives. If DateAndTime is a variable byte class, then 
> it can hold the utcMicroSeconds in 8 variable slots (2 words). I don't 
> know if the LargeInteger primitives would work with it, but I think they 
> should, so comparison and arithmetic methods could be based on them.
> 

This probably would work, but I don't think that it would be a good thing
to do. The "microseconds" in the variable name is intended to indicate
the scale, not the actual numeric representation. For example, It would be
a Fraction in the case of parsing a DateAndTime with nanosecond precision
from a string.

> But it's probably not worth to care about this, because Spur will 
> change these things.
> 

Yes, for sure.

Dave



More information about the Squeak-dev mailing list