[squeak-dev] Re: 4.1 - hashed collections still a problem

Andres Valloud avalloud at smalltalk.comcastbiz.net
Wed Mar 31 00:51:34 UTC 2010


What is the purpose of adding an id to so many objects?  Is this a real 
application problem?  Can you be more specific as to the context?

On 3/30/10 9:21 , Levente Uzonyi wrote:
> On Tue, 30 Mar 2010, Andres Valloud wrote:
>
>    
>> I mentioned implementing identityHash as hash as an example given.  I also
>> think I mentioned the rule x == y =>  x identityHash = y identityHash, so I
>> hope it's clear that one shouldn't just blindly move ahead in these matters.
>> The point is that nothing prevents anybody from adding an instance variable
>> called identityHash to their objects, storing arbitrary (but well chosen!)
>> small integers in said instance variable, and then having the identityHash
>> message just answer said values.  If you do this for the cases in which you
>> have significantly more than 4096 objects, then you only pay the price to
>> hold better identityHash values for the objects that need them (as opposed to
>> every single object in the image when the header is expanded instead).  Or
>> perhaps you don't really need identity for the cases being discussed in
>> practice, and just using hash as identityHash is fine.  It's hard to tell
>> without concrete examples.  One way or the other, I do not think the size of
>> the identityHash field *must* result in poor hashed collection performance.
>> Such an implication does not follow.
>>      
> The original problem is: pick 1000000 or more objects and associate an id
> to them.
> - Using #hash and #== doesn't work help, because the objects state may
> change.
> - Adding a slot to every object can't be done, because the class of the
> objects can be anything. Adding a slot to Object would break lots of code.
> - Using #largeHash (12 bits from the object + 12 bits from the object's
> class) doesn't help, because there may be only a few different classes.
>
>
> Levente
>
>    
>> On 3/30/10 1:48 , Andreas Raab wrote:
>>      
>>> On 3/30/2010 1:09 AM, Andres Valloud wrote:
>>>
>>>        
>>>> Right, so provide a better identityHash implementation in the image
>>>> (e.g.: implement hash and then have identityHash call hash instead), and
>>>> problem solved...
>>>>
>>>>          
>>> Except that #hash is not constant over the lifetime of most objects but
>>> #identityHash is. So if you have a property associated with an object in
>>> a IDDict and the #hash depends on a value of a variable it may change
>>> over the lifetime of the object and your key gets invalid.
>>>
>>> Cheers,
>>>      - Andreas
>>>
>>>
>>>        
>>>> On 3/26/10 3:37 , Levente Uzonyi wrote:
>>>>
>>>>          
>>>>> On Thu, 25 Mar 2010, Andres Valloud wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> If lookups find the sought object in mostly one attempt, the
>>>>>> primitive is
>>>>>> overkill... most of the time, the real issue is the quality of the hash
>>>>>> function.
>>>>>>
>>>>>>              
>>>>> That's true, but this is not the case with the 4096 hash values.
>>>>>
>>>>>
>>>>> Levente
>>>>>
>>>>>
>>>>>            
>>>>>> On 3/25/10 1:27 , Levente Uzonyi wrote:
>>>>>>
>>>>>>              
>>>>>>> On Thu, 25 Mar 2010, Igor Stasenko wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> i think that #pointsTo: is a cheat :), which you can use in Sets but
>>>>>>>> not dictionaries, because
>>>>>>>> it contains associations. Also, it works only for identity-based
>>>>>>>> collections.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> Dictionaries don't have to use associations (for example
>>>>>>> MethodDictionary
>>>>>>> doesn't use them), that's why #pointsTo: works (MethodDictionary also
>>>>>>> uses it).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>>> I wonder how LargeIdentityDictionary compares to your dictionaries'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> me too.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> If you give me a pointer to the source code, I can run the benchmarks.
>>>>>>>
>>>>>>>
>>>>>>> Levente
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>              
>>>>>            
>>>>
>>>>          
>>>
>>>
>>>        
>>
>>      
> .
>
>    



More information about the Squeak-dev mailing list