[squeak-dev] A UTC based implementation of DateAndTime

Levente Uzonyi leves at elte.hu
Mon Jun 2 19:21:06 UTC 2014


On Tue, 27 May 2014, David T. Lewis wrote:

> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>> 2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:
>>
>>> The issue actually relates purely to Squeak domain models.  Consider
>>> the case of an all-in-memory object model in Squeak, with no database
>>> involved at all.  It is very feasible an app would want to import a
>>> flat-file dataset that involves creation a few million DateAndTime
>>> instances (along with other objects, of course) to the point where
>>> memory constraints begin to be noticed.
>>>
>>> When dealing with this level of prolifigation-potential of a
>>> particular class, and for such a base data-type we don't want to
>>> endure changing again, I want us to strongly scrutinize the internal
>>> representation.
>>>
>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>> duplicate bit-patterns in memory that are very hard, if not
>>> impossible, to share.
>>>
>>> The simplest case are two equivalent instances of DateAndTime (read
>>> from separate files).  Despite being equivalent, their
>>> utcMicroseconds' will be separate objects each consuming separate
>>> memory space.  There is no easy way to share the same
>>> 'utcMicroseconds' instance between them.
>>>
>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>> duplicates the same bit pattern, again and again, eating up memory.
>>>
>>> That doesn't happen when the internal representations are, or can be,
>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>> original representation requires two additional slots per instance,
>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>
>>>
>> Well, in current 32 bit image format, SmallInteger are not exactly shared,
>> they are immediate values.
>> Each consumes exactly 32 bits.
>>
>> For a compact class like LargePosOrNegInteger, I don't remember what is the
>> header size exactly, but you get 64 bits for data, I would be surprised to
>> see a major difference wrt consumed memory.
>>
>
> Smalltalk compactClassesArray includes: DateAndTime ==> false
> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>
> So for the traditional DateAndTime implementation, an instance requires:
>
>  2 words of header (64 bits)
>  3 words for the small integer jdn/seconds/nanos variables
>  1 word for the pointer to the offset object, which is an instance of Duration
>
> In practice, most instances of DateAndTime within an image will share the
> same offset object, so for purposes of estimation assume that this takes
> no extra space.
>
> Thus each instance requires 6 words of space in the object memory (maybe a bit
> more on average if the DateAndTime instances are not sharing the same Duration
> instance for one reason or another).
>
> For the UTC based implementation of DateAndTime, each instance requires:
>
>  2 words of header
>  1 word for the small integer localOffsetSeconds variable
>  1 word for the pointer to the LargePositiveInteger representing utcMicroSeconds
>  1 word of header for the large positive integer
>  2 words of data for the value of the large positive integer
>
> Thus each instance requires 7 words of space in the object memory.
>
> So there is a difference, but it would probably not be a large effect on
> overall space utilization, even assuming complete sharing of the offset
> Duration instances.

I think it's possible to reduce the number of words to 5 at the cost of 
reusing integer primitives. If DateAndTime is a variable byte class, then 
it can hold the utcMicroSeconds in 8 variable slots (2 words). I don't 
know if the LargeInteger primitives would work with it, but I think they 
should, so comparison and arithmetic methods could be based on them.

But it's probably not worth to care about this, because Spur will 
change these things.


Levente

>
> Dave
>
>
>


More information about the Squeak-dev mailing list