hashing with collections.

Peter van Rooijen peter at vanrooijen.com
Tue Dec 16 08:54:25 UTC 2003


From: "Ralph Boland" <rboland at porchlight.ca>
> Ralph Boland wrote:
> I just thought of a serious problem with my idea.  If you have
> collections within collections then you have the problem that
> when the contained collection's hash value is changed so does
> the hash value of the collection that contained it.
> This is clearly a major problem so the idea of storing hash
> values with collections can only work when the hash values of the
> elements in a collection become immutable once the collection is
> made an element of another collection.
> So we can't apply this idea to Smalltalk collections in general.

Yes, that seems correct.

> However, they are still useful when you know that this is the
> case.
> In fact this is the case for my applications:
> constructing finite state machines and LR based parsers.
> This suggests that having kinds of collections that store
> and update their hash value is useful but you better know
> what you are doing when you use them.

Yes.

> I needed them for my application.  I am not sure if they
> are useful enough to be made part of the standard
> smalltalk class libary.

You might not want to create <collection>WithConstantHashElements for many
collection classes, and you don't have the opportunity to "mix-in" this
property to existing collection classes. So, unless you only need this for a
very small number of kinds of collections (one, maybe two), perhaps it would
be best to make a collection wrapper that does this generically.

It would intercept the element addition/removal methods and the hash method
and add appropriate processing.

Perhaps you'd want to make two variants, an eager one that updates the hash
on each addition/removal, and a lazy one that simply marks the hash as
dirty/not-up-to-date and recalculates (and caches) it as soon as the hash
value is asked for.

Caching calculated values in wrappers when you know a calculated value can't
change during a certain operation, can work very well. I have had good
success that technique for sorting things on "aspect value" using composable
sort orders expressed as objects other than using complicated sortBlocks.
There's some administration that you have to get just right, but when it's
done, it's generic and it just works.

Below is a short description which explains where the wrappers are used for
this type of cached sorting.

Regards,

Peter

---------------------------------------
Here's an example of a non-compound sort order in use for sorting a list of
packages by name in a browser I wrote:

self
 setSortOrderIn: #listPackages
 to: (self newSortOrder onAscending: [:pack | self nameStringOfpackage:
pack])

Here's an example of a compound sort order:

newSortOrderMessagesByNameStringAscending

^
self newSortOrder
 name: 'by name ascending'
; onAscending: [:message | message nameString]
; thenOnAscending: [:message | message protocol nameString]

This is used for sorting message specifications in the ANSI Smalltalk
standard, which have a nameString (the message selector as a string), and
belong to a Protocol, which has a nameString as well (such as 'Array
factory').

Such SortOrder objects, which specify one or more aspects to sort on, are
then used by SortProcedure objects which do the sorting and these
SortProcedures use instances of AspectValueCollection which store all
objects in a (sub) sort that have the same aspect value.

So, the SortProcedure, when it uses the standard sorting algorithm borrowed
from the dialect, actually sorts AspectValueCollections, i.e., collections
of objects that have the same aspect value. All aspect values are calculated
at most once once for each element in the collection to be sorted, where the
standard sorting algorithm will do multiple applications of the sortBlock
for the typical element, duplicating many calculations.

Of course, for sorts that are already fast (e.g., because the sortBlock asks
for values that are already stored in the objects to be sorted), using this
system makes the sorts slower. Fortunately, the fast sorts are not usually
not the ones that worry us :-).




More information about the Squeak-dev mailing list