[squeak-dev] Re: 4.1 - hashed collections still a problem

Levente Uzonyi leves at elte.hu
Wed Mar 31 00:55:12 UTC 2010


On Tue, 30 Mar 2010, Andres Valloud wrote:

> What is the purpose of adding an id to so many objects?  Is this a real 
> application problem?  Can you be more specific as to the context?

AFAIK Magma uses it to assign unique ids to objects.


Levente

>
> On 3/30/10 9:21 , Levente Uzonyi wrote:
>> On Tue, 30 Mar 2010, Andres Valloud wrote:
>>
>> 
>>> I mentioned implementing identityHash as hash as an example given.  I also
>>> think I mentioned the rule x == y =>  x identityHash = y identityHash, so 
>>> I
>>> hope it's clear that one shouldn't just blindly move ahead in these 
>>> matters.
>>> The point is that nothing prevents anybody from adding an instance 
>>> variable
>>> called identityHash to their objects, storing arbitrary (but well chosen!)
>>> small integers in said instance variable, and then having the identityHash
>>> message just answer said values.  If you do this for the cases in which 
>>> you
>>> have significantly more than 4096 objects, then you only pay the price to
>>> hold better identityHash values for the objects that need them (as opposed 
>>> to
>>> every single object in the image when the header is expanded instead).  Or
>>> perhaps you don't really need identity for the cases being discussed in
>>> practice, and just using hash as identityHash is fine.  It's hard to tell
>>> without concrete examples.  One way or the other, I do not think the size 
>>> of
>>> the identityHash field *must* result in poor hashed collection 
>>> performance.
>>> Such an implication does not follow.
>>> 
>> The original problem is: pick 1000000 or more objects and associate an id
>> to them.
>> - Using #hash and #== doesn't work help, because the objects state may
>> change.
>> - Adding a slot to every object can't be done, because the class of the
>> objects can be anything. Adding a slot to Object would break lots of code.
>> - Using #largeHash (12 bits from the object + 12 bits from the object's
>> class) doesn't help, because there may be only a few different classes.
>> 
>> 
>> Levente
>>
>> 
>>> On 3/30/10 1:48 , Andreas Raab wrote:
>>> 
>>>> On 3/30/2010 1:09 AM, Andres Valloud wrote:
>>>>
>>>> 
>>>>> Right, so provide a better identityHash implementation in the image
>>>>> (e.g.: implement hash and then have identityHash call hash instead), and
>>>>> problem solved...
>>>>>
>>>>> 
>>>> Except that #hash is not constant over the lifetime of most objects but
>>>> #identityHash is. So if you have a property associated with an object in
>>>> a IDDict and the #hash depends on a value of a variable it may change
>>>> over the lifetime of the object and your key gets invalid.
>>>> 
>>>> Cheers,
>>>>      - Andreas
>>>> 
>>>>
>>>> 
>>>>> On 3/26/10 3:37 , Levente Uzonyi wrote:
>>>>>
>>>>> 
>>>>>> On Thu, 25 Mar 2010, Andres Valloud wrote:
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>> If lookups find the sought object in mostly one attempt, the
>>>>>>> primitive is
>>>>>>> overkill... most of the time, the real issue is the quality of the 
>>>>>>> hash
>>>>>>> function.
>>>>>>>
>>>>>>> 
>>>>>> That's true, but this is not the case with the 4096 hash values.
>>>>>> 
>>>>>> 
>>>>>> Levente
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>> On 3/25/10 1:27 , Levente Uzonyi wrote:
>>>>>>>
>>>>>>> 
>>>>>>>> On Thu, 25 Mar 2010, Igor Stasenko wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>
>>>>>>>> 
>>>>>>>>> i think that #pointsTo: is a cheat :), which you can use in Sets but
>>>>>>>>> not dictionaries, because
>>>>>>>>> it contains associations. Also, it works only for identity-based
>>>>>>>>> collections.
>>>>>>>>> 
>>>>>>>>>
>>>>>>>>> 
>>>>>>>> Dictionaries don't have to use associations (for example
>>>>>>>> MethodDictionary
>>>>>>>> doesn't use them), that's why #pointsTo: works (MethodDictionary also
>>>>>>>> uses it).
>>>>>>>> 
>>>>>>>> 
>>>>>>>>
>>>>>>>> 
>>>>>>>>>> I wonder how LargeIdentityDictionary compares to your 
>>>>>>>>>> dictionaries'.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>>> me too.
>>>>>>>>> 
>>>>>>>>>
>>>>>>>>> 
>>>>>>>> If you give me a pointer to the source code, I can run the 
>>>>>>>> benchmarks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Levente
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> 
>>>>>
>>>>> 
>>>> 
>>>>
>>>> 
>>>
>>> 
>> .
>>
>> 
>
>



More information about the Squeak-dev mailing list