The issue actually relates purely to Squeak domain models. Consider the case of an all-in-memory object model in Squeak, with no database involved at all. It is very feasible an app would want to import a flat-file dataset that involves creation a few million DateAndTime instances (along with other objects, of course) to the point where memory constraints begin to be noticed.
When dealing with this level of prolifigation-potential of a particular class, and for such a base data-type we don't want to endure changing again, I want us to strongly scrutinize the internal representation.
In this case, the use of 'utcMicroseconds' introduces a lot of duplicate bit-patterns in memory that are very hard, if not impossible, to share.
The simplest case are two equivalent instances of DateAndTime (read from separate files). Despite being equivalent, their utcMicroseconds' will be separate objects each consuming separate memory space. There is no easy way to share the same 'utcMicroseconds' instance between them.
But fully-equivalent DateAndTime's is not even half of the concern -- the high-order bits of every DateAndTime's 'utcMicroseconds' duplicates the same bit pattern, again and again, eating up memory.
That doesn't happen when the internal representations are, or can be, canonicalized, as in the case of using SmallIntegers. Yes, Brent's original representation requires two additional slots per instance, but the _contents_ of those slots are SmallIntegers -- shared memory.
On Mon, May 26, 2014 at 8:29 PM, David T. Lewis lewis@mail.msen.com wrote:
On Mon, May 26, 2014 at 01:09:06PM -0500, Chris Muller wrote:
Hi Dave, as someone who works with large systems in Squeak, I'm always interested in _storage efficiency_ as much as execution efficiency.
DateAndTime, in particular, is a very common domain element with a high potential for there to be many millions of instances in a given domain model.
Apps which have millions of objects with merely a Date attribute can canonicalize them. And, apps which have millions of Time objects can canonicalize them.
But LargeInteger's are not easy to canonicalize (e.g., utcMicroseconds). So a database system with millions of DateAndTime's would have to do _two_ reads for every DateAndTime instance instead of just one today (because SmallIntegers are immediate, while LargeIntegers require their own storage buffer).
Hi Chris,
I do not have a lot of experience with database systems, so I would like to better understand the issue for storage of large numeric values.
I was under the impression that modern SQL databases provide direct support for large integer data types (e.g. bigint for SQL server), and my assumption was that object databases such as Magma or GemStone would make this a non-issue. Why is it that a large (64 bit) integer should be any more or less difficult to persist than a small integer?
This may be a dumb question but I am curious.
Thanks, Dave