[squeak-dev] A UTC based implementation of DateAndTime

Chris Muller asqueaker at gmail.com
Mon Jun 2 20:16:41 UTC 2014


It's probably possible to get it down to 3 words if DateAndTime were
represented as one canonicalized 'date', one canonicalized 'time'
(precise to the second), and one SmallInteger for millis or micros..

The more and higher-level parts a DateAndTime can be constructed with,
the better the opportunity for memory-optimization.  Conversely, the
more an implementation moves more toward a 'binary data'
representation, the fewer of those bits can be shared and, therefore,
the more that will be duplicated across instances.

On Mon, Jun 2, 2014 at 2:21 PM, Levente Uzonyi <leves at elte.hu> wrote:
> On Tue, 27 May 2014, David T. Lewis wrote:
>
>> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>>>
>>> 2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:
>>>
>>>> The issue actually relates purely to Squeak domain models.  Consider
>>>> the case of an all-in-memory object model in Squeak, with no database
>>>> involved at all.  It is very feasible an app would want to import a
>>>> flat-file dataset that involves creation a few million DateAndTime
>>>> instances (along with other objects, of course) to the point where
>>>> memory constraints begin to be noticed.
>>>>
>>>> When dealing with this level of prolifigation-potential of a
>>>> particular class, and for such a base data-type we don't want to
>>>> endure changing again, I want us to strongly scrutinize the internal
>>>> representation.
>>>>
>>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>>> duplicate bit-patterns in memory that are very hard, if not
>>>> impossible, to share.
>>>>
>>>> The simplest case are two equivalent instances of DateAndTime (read
>>>> from separate files).  Despite being equivalent, their
>>>> utcMicroseconds' will be separate objects each consuming separate
>>>> memory space.  There is no easy way to share the same
>>>> 'utcMicroseconds' instance between them.
>>>>
>>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>>> duplicates the same bit pattern, again and again, eating up memory.
>>>>
>>>> That doesn't happen when the internal representations are, or can be,
>>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>>> original representation requires two additional slots per instance,
>>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>>
>>>>
>>> Well, in current 32 bit image format, SmallInteger are not exactly
>>> shared,
>>> they are immediate values.
>>> Each consumes exactly 32 bits.
>>>
>>> For a compact class like LargePosOrNegInteger, I don't remember what is
>>> the
>>> header size exactly, but you get 64 bits for data, I would be surprised
>>> to
>>> see a major difference wrt consumed memory.
>>>
>>
>> Smalltalk compactClassesArray includes: DateAndTime ==> false
>> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>>
>> So for the traditional DateAndTime implementation, an instance requires:
>>
>>  2 words of header (64 bits)
>>  3 words for the small integer jdn/seconds/nanos variables
>>  1 word for the pointer to the offset object, which is an instance of
>> Duration
>>
>> In practice, most instances of DateAndTime within an image will share the
>> same offset object, so for purposes of estimation assume that this takes
>> no extra space.
>>
>> Thus each instance requires 6 words of space in the object memory (maybe a
>> bit
>> more on average if the DateAndTime instances are not sharing the same
>> Duration
>> instance for one reason or another).
>>
>> For the UTC based implementation of DateAndTime, each instance requires:
>>
>>  2 words of header
>>  1 word for the small integer localOffsetSeconds variable
>>  1 word for the pointer to the LargePositiveInteger representing
>> utcMicroSeconds
>>  1 word of header for the large positive integer
>>  2 words of data for the value of the large positive integer
>>
>> Thus each instance requires 7 words of space in the object memory.
>>
>> So there is a difference, but it would probably not be a large effect on
>> overall space utilization, even assuming complete sharing of the offset
>> Duration instances.
>
>
> I think it's possible to reduce the number of words to 5 at the cost of
> reusing integer primitives. If DateAndTime is a variable byte class, then it
> can hold the utcMicroSeconds in 8 variable slots (2 words). I don't know if
> the LargeInteger primitives would work with it, but I think they should, so
> comparison and arithmetic methods could be based on them.
>
> But it's probably not worth to care about this, because Spur will change
> these things.
>
>
> Levente
>
>>
>> Dave
>>
>>
>>
>


More information about the Squeak-dev mailing list