[BUG][Performance] Symbol>>hash too slow [was:
[BUG][FIX][Slowdown] Add preference too slow (Symbol>>hash too slow)]
Andrew P. Black
black at cse.ogi.edu
Tue Feb 20 07:27:55 UTC 2001
Here is Dan's original comment:
At 15:10 -0700 2000.09.18, Dan Ingalls wrote:
>
>2630-BadHashFix-SqR -- Andres Valloud & Luciano Notarfrancesco -- 3
>August 2000
>Removes Symbol>>hash.
>This is done since now symbols may be equal to strings and
>viceversa. When they were equal, though, their hashes were not and
>that's not good.
>Fixing this brought up some issues with the Interpreter. When the C
>code is generated, some methods are inlined. Such inlined methods
>have local variable names that mask global variable names. There is
>a check for this (CCodeGenerator>>prepareMethods). The methods
>dictionary had symbol keys, but it was queried for strings. Since
>the hashes for equal variable names were different, those
>overlapping names were never reported as errors by the
>CCodeGenerator. We changed the names of the local variables by
>prefixing them with 'local'.
>Also reimplements String>>hash. The previous implementations could
>provide only 14 bits of hash, and this was just at its best. It did
>not spread hash values enough either, so just about 12 bits were
>being used by all the Strings in the system. The new implementation
>of hash for Strings provides an excellent hash mentioned by Mark van
>Gulik, and it is now used in more places through the Collection
>hierarchy (check ByteArray>>hash and SequenceableCollection>>hash).
>Also reimplements Fraction>>hash so that it does not answer
>LargeIntegers. A small method called hashMultiply was added to
>SmallInteger to keep things clean. A convenient method was also
>added to Set class to rehash all sets except MethodDictionaries
>(takes too long and they are not affected by #hash).
>Finally, do not edit or fileOut this changeset! It contains hand
>made postscripts and initialization code that will rehash all Sets
>and Dictionaries in your system.
>A quick check to see if this changeset did its work properly is to
>do the following:
> * Try to get the Set class by typing 'Set' in a workspace and
>then doing a print-it
> * Open a workspace
>If that works, you are in good shape.
>SqR & len, 8/4/2000 14:41"
I half remember the original discussion too, but I don't recall the
rationale that said _why_ it is ever reasonable to expect a String
and a Symbol to ever be equal. Could someone enlighten me?
In the particular example mentioned in Dan's comment,
CCodeGenerator>>prepareMethods, the issue seems to be checking
whether a string matches any symbol in the set of globals or method
names. Clearly, interning the string (which involves hashing in to
the SymbolTable) is a waste of time, since if the string in question
is not in the SymbolTable, there is no point in adding it -- it can't
be in the globals or methods names sets either.
<Alert mode: 'Fools rush in where Angels fear to tread ...'>
So, it seems to me that the existing code, which computes the hash of
a string (by looking at all the characters) in order to compare it
against a set of symbols, should be used in exactly one place:
String>>asSymbol.
Having to look at every character in a Symbol to compute its hash
sounds like a bad idea. In my mind, the whole reason for the
existence of Symbols as a separate class is to make comparisons
between them fast (by permitting us to use Symbol identity as a fast
implementation of equality), which would lead to the very simple code
Symbol>>= another
^ self == another
Symbol>>hash
^self identityHash
This ought to maintain the required relationship between = and hash.
The fact that Doug Way had to work around the performance of
Symbol>>hash says that we are making the wrong tradeoff
<Alert endMode>
OK, what am I missing?
Andrew
More information about the Squeak-dev
mailing list
|