[squeak-dev] Re: 4.1 - hashed collections still a problem

Levente Uzonyi leves at elte.hu
Tue Mar 30 16:21:59 UTC 2010


On Tue, 30 Mar 2010, Andres Valloud wrote:

> I mentioned implementing identityHash as hash as an example given.  I also 
> think I mentioned the rule x == y => x identityHash = y identityHash, so I 
> hope it's clear that one shouldn't just blindly move ahead in these matters. 
> The point is that nothing prevents anybody from adding an instance variable 
> called identityHash to their objects, storing arbitrary (but well chosen!) 
> small integers in said instance variable, and then having the identityHash 
> message just answer said values.  If you do this for the cases in which you 
> have significantly more than 4096 objects, then you only pay the price to 
> hold better identityHash values for the objects that need them (as opposed to 
> every single object in the image when the header is expanded instead).  Or 
> perhaps you don't really need identity for the cases being discussed in 
> practice, and just using hash as identityHash is fine.  It's hard to tell 
> without concrete examples.  One way or the other, I do not think the size of 
> the identityHash field *must* result in poor hashed collection performance. 
> Such an implication does not follow.

The original problem is: pick 1000000 or more objects and associate an id 
to them.
- Using #hash and #== doesn't work help, because the objects state may 
change.
- Adding a slot to every object can't be done, because the class of the 
objects can be anything. Adding a slot to Object would break lots of code.
- Using #largeHash (12 bits from the object + 12 bits from the object's 
class) doesn't help, because there may be only a few different classes.


Levente

>
> On 3/30/10 1:48 , Andreas Raab wrote:
>> On 3/30/2010 1:09 AM, Andres Valloud wrote:
>> 
>>> Right, so provide a better identityHash implementation in the image
>>> (e.g.: implement hash and then have identityHash call hash instead), and
>>> problem solved...
>>> 
>> Except that #hash is not constant over the lifetime of most objects but
>> #identityHash is. So if you have a property associated with an object in
>> a IDDict and the #hash depends on a value of a variable it may change
>> over the lifetime of the object and your key gets invalid.
>> 
>> Cheers,
>>     - Andreas
>>
>> 
>>> On 3/26/10 3:37 , Levente Uzonyi wrote:
>>> 
>>>> On Thu, 25 Mar 2010, Andres Valloud wrote:
>>>>
>>>> 
>>>>> If lookups find the sought object in mostly one attempt, the
>>>>> primitive is
>>>>> overkill... most of the time, the real issue is the quality of the hash
>>>>> function.
>>>>> 
>>>> That's true, but this is not the case with the 4096 hash values.
>>>> 
>>>> 
>>>> Levente
>>>>
>>>> 
>>>>> On 3/25/10 1:27 , Levente Uzonyi wrote:
>>>>> 
>>>>>> On Thu, 25 Mar 2010, Igor Stasenko wrote:
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>> i think that #pointsTo: is a cheat :), which you can use in Sets but
>>>>>>> not dictionaries, because
>>>>>>> it contains associations. Also, it works only for identity-based
>>>>>>> collections.
>>>>>>>
>>>>>>> 
>>>>>> Dictionaries don't have to use associations (for example
>>>>>> MethodDictionary
>>>>>> doesn't use them), that's why #pointsTo: works (MethodDictionary also
>>>>>> uses it).
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>>> I wonder how LargeIdentityDictionary compares to your dictionaries'.
>>>>>>>> 
>>>>>>>>
>>>>>>>> 
>>>>>>> me too.
>>>>>>>
>>>>>>> 
>>>>>> If you give me a pointer to the source code, I can run the benchmarks.
>>>>>> 
>>>>>> 
>>>>>> Levente
>>>>>> 
>>>>>> 
>>>>>>
>>>>>>
>>>>>
>>>> 
>>>
>>> 
>> 
>>
>> 
>
>



More information about the Squeak-dev mailing list