"lazy hash fields" is a way to get a larger range of identify hash values without significantly increasing the average object size. We have successfully used all of the techniques described below in various implementations.
Basic idea: Asume that the identify hash is not actually accessed for most objects. So, use a extra object slot to hold the hash value but only allocate this slot when and if the hash value is actually accessed. Exploit the fact that the garbage collector moves objects to simplify dynamically adding the hash slot.
Details: Allocate two bits (the hash tag) in the header with the following interpretation: 0 hash value has not been accessed, no hash value assigned. 1 hash value has been accessed, hash field not yet allocated 2 hash value has been accessed and the hash field has been added 3 unused, reserved, special, etc.
Most objects are initially allocated with a hash tag of 0. The first time an attempt is made to access the hash value of such an object, its hash tag is set to 1 and a hash value is assigned. The value must be assigned in such a way that it can be regenerated or retrieve on subsequent accesses. There are at least two possible ways to do this: a) use some function on the current address of the object as its hash value; b) assign an random value as the hash value and store that value in a "look-aside" table, keyed by the object's current address. (using the address as the value is simpler but may not provide a very good distribution of hash values)
When the hash value is accessed for an object whose hash tag is 1, the current address of the object is used to either recompute its hash function or to access the look-aside table, as appropriate to the implementation.
When the garbage collector needs to move an object it most check the value of the hash tag. If the object to be moved has a hash tag of 1 the GC must extent the object in a implementation defined way with an additional field to contain the hash value. Common approaches are to added the field at the end of the object or as a negative offset from the base of the object. After allocating the field the GC computes the hash value or retrieves it from the look-aside table and stores the value in the field. The hash tag of the object is set to 2. If a look-aside table is used, the object's table entry is freed at this time.
When the hash value is accessed for an object whose hash tag is 2, the hash value is simply retrieved from the extended hash field.
Various interesting elaboration are possible. For example, if there are permanent objects that can not move the extra hash tag value might be used to say use the object's address as the hash value. (Or, if the implementation does not use a look-aside table, just use tag 1 to indicate this.)
It is sometimes useful, to have the ability to explicitly set an object's identify hash to a specific value. For example, the world is simpler if an externalized object can be reconstituted with the same identify hash value it had when it was externalized. Note that it is perfectly safe for a program to assign arbitrary identify hash values to an object up until the first time that the object's identify hash is actually accessed. The look-aside table variation makes its very easy to do this. Simply add a set identity hash primitive that creates look-aside entries for objects whose hash tag is 0. (Fail for other tag values)
Happy hashing, Allen_Wirfs-Brock@instantiations.com
squeak-dev@lists.squeakfoundation.org