[ANN] Large Collections (on SqueakMap)

Stephen Pair spair at acm.org
Fri Dec 6 19:11:01 UTC 2002


This package adds several classes to Squeak to enable it to better
handle collections that contain a very large number of items. These
classes are:

- LargeArray
- LargeIdentityDictionary
- ExtendedAttributeDictionary
- TupleAttributeDictionary

LargeArray forms the basis of the other large collections and can also
serve as the basis for other large collections (or existing collection
classes instantiated with a LargeArray serving as the base collection).

This package does some minor refactoring of the existing collection
classes to enable LargeArray to be 100% compatible with Array.

In Squeak, very large collections present a problem for the garbage
collector. When large objects exist in young space, these objects are
scanned very frequently by the incremental garbage collector. When a
reference to a young object is stored in an old object, incremental
garbage collection must also scan those old objects (which presents a
performance issue if those objects are very large). LargeArray addresses
this issue by "chunking" up the large array into a set of smaller
arrays.

One could argue that garbage collection should be made to handle large
objects better. It certainly wouldn't hurt if the cost in terms of
algorithmic complexity and performance are not too high. However, there
are other reasons for breaking large collections up into a set of
smaller ones. One key reason is to allow the individual chunks of a
large collections to be independently swapped in and out of object
memory by a cache manager. The net result is a "paging" effect on the
large collection.

LargeIdentityDictionary is just that. See the class comments for more
details.

ExtendedAttributeDictionary is designed for use when you want to
efficiently lookup extended state for a large number of objects.

TupleAttributeDictionary is useful for looking up state associated with
two or more objects.

For more details see the class comments for these classes.

As a performance comparision, tracing an image (with SystemTracer2) on
my machine took 9.7 minutes without large collections and 2.5 minutes
with large collection.




More information about the Squeak-dev mailing list