[squeak-dev] A UTC based implementation of DateAndTime

Chris Muller asqueaker at gmail.com
Mon Jun 2 20:28:21 UTC 2014


On Mon, Jun 2, 2014 at 3:16 PM, Chris Muller <asqueaker at gmail.com> wrote:
> It's probably possible to get it down to 3 words if DateAndTime were
> represented as one canonicalized 'date', one canonicalized 'time'
> (precise to the second), and one SmallInteger for millis or micros..

Forgot about the offset so, okay, 4 words.  Plus the object header, 2 words.

Hmm, that is sounding very familiar!  :)

> The more and higher-level parts a DateAndTime can be constructed with,
> the better the opportunity for memory-optimization.  Conversely, the
> more an implementation moves more toward a 'binary data'
> representation, the fewer of those bits can be shared and, therefore,
> the more that will be duplicated across instances.
>
> On Mon, Jun 2, 2014 at 2:21 PM, Levente Uzonyi <leves at elte.hu> wrote:
>> On Tue, 27 May 2014, David T. Lewis wrote:
>>
>>> On Tue, May 27, 2014 at 09:55:33PM +0200, Nicolas Cellier wrote:
>>>>
>>>> 2014-05-27 4:30 GMT+02:00 Chris Muller <ma.chris.m at gmail.com>:
>>>>
>>>>> The issue actually relates purely to Squeak domain models.  Consider
>>>>> the case of an all-in-memory object model in Squeak, with no database
>>>>> involved at all.  It is very feasible an app would want to import a
>>>>> flat-file dataset that involves creation a few million DateAndTime
>>>>> instances (along with other objects, of course) to the point where
>>>>> memory constraints begin to be noticed.
>>>>>
>>>>> When dealing with this level of prolifigation-potential of a
>>>>> particular class, and for such a base data-type we don't want to
>>>>> endure changing again, I want us to strongly scrutinize the internal
>>>>> representation.
>>>>>
>>>>> In this case, the use of 'utcMicroseconds' introduces a lot of
>>>>> duplicate bit-patterns in memory that are very hard, if not
>>>>> impossible, to share.
>>>>>
>>>>> The simplest case are two equivalent instances of DateAndTime (read
>>>>> from separate files).  Despite being equivalent, their
>>>>> utcMicroseconds' will be separate objects each consuming separate
>>>>> memory space.  There is no easy way to share the same
>>>>> 'utcMicroseconds' instance between them.
>>>>>
>>>>> But fully-equivalent DateAndTime's is not even half of the concern --
>>>>> the high-order bits of every DateAndTime's 'utcMicroseconds'
>>>>> duplicates the same bit pattern, again and again, eating up memory.
>>>>>
>>>>> That doesn't happen when the internal representations are, or can be,
>>>>> canonicalized, as in the case of using SmallIntegers.  Yes, Brent's
>>>>> original representation requires two additional slots per instance,
>>>>> but the _contents_ of those slots are SmallIntegers -- shared memory.
>>>>>
>>>>>
>>>> Well, in current 32 bit image format, SmallInteger are not exactly
>>>> shared,
>>>> they are immediate values.
>>>> Each consumes exactly 32 bits.
>>>>
>>>> For a compact class like LargePosOrNegInteger, I don't remember what is
>>>> the
>>>> header size exactly, but you get 64 bits for data, I would be surprised
>>>> to
>>>> see a major difference wrt consumed memory.
>>>>
>>>
>>> Smalltalk compactClassesArray includes: DateAndTime ==> false
>>> Smalltalk compactClassesArray includes: LargePositiveInteger ==> true
>>>
>>> So for the traditional DateAndTime implementation, an instance requires:
>>>
>>>  2 words of header (64 bits)
>>>  3 words for the small integer jdn/seconds/nanos variables
>>>  1 word for the pointer to the offset object, which is an instance of
>>> Duration
>>>
>>> In practice, most instances of DateAndTime within an image will share the
>>> same offset object, so for purposes of estimation assume that this takes
>>> no extra space.
>>>
>>> Thus each instance requires 6 words of space in the object memory (maybe a
>>> bit
>>> more on average if the DateAndTime instances are not sharing the same
>>> Duration
>>> instance for one reason or another).
>>>
>>> For the UTC based implementation of DateAndTime, each instance requires:
>>>
>>>  2 words of header
>>>  1 word for the small integer localOffsetSeconds variable
>>>  1 word for the pointer to the LargePositiveInteger representing
>>> utcMicroSeconds
>>>  1 word of header for the large positive integer
>>>  2 words of data for the value of the large positive integer
>>>
>>> Thus each instance requires 7 words of space in the object memory.
>>>
>>> So there is a difference, but it would probably not be a large effect on
>>> overall space utilization, even assuming complete sharing of the offset
>>> Duration instances.
>>
>>
>> I think it's possible to reduce the number of words to 5 at the cost of
>> reusing integer primitives. If DateAndTime is a variable byte class, then it
>> can hold the utcMicroSeconds in 8 variable slots (2 words). I don't know if
>> the LargeInteger primitives would work with it, but I think they should, so
>> comparison and arithmetic methods could be based on them.
>>
>> But it's probably not worth to care about this, because Spur will change
>> these things.
>>
>>
>> Levente
>>
>>>
>>> Dave
>>>
>>>
>>>
>>


More information about the Squeak-dev mailing list