[squeak-dev] Re: 4.1 - hashed collections still a problem

Andres Valloud avalloud at smalltalk.comcastbiz.net
Fri Apr 2 00:43:06 UTC 2010


At the time I wrote that, I was assuming that Magma was a relational 
database mapper.  Sigh :(.  In any case, now you have the same "problem" 
GemStone has.

On 4/1/10 10:26 , Chris Muller wrote:
> Magma oids are allocated by the server, they're consecutive integers.
> identityHash has nothing to do with that.
>
> Andre wrote:
>
>    
>> The point is that nothing prevents anybody from adding an instance
>> variable called identityHash to their objects, storing arbitrary (but
>> well chosen!) small integers in said instance variable, and then...
>>      
> It has to be for any object; Morphs, Assocaitions, Sets, Strings,
> LargeIntegers, etc., not just my own application objects.
>
> You make it sound easy; but the other thing to remember is the
> importance of doing it very *quickly*, because if #identityHash is
> long to initialize or calculate, then you lose speed that way.
>
> So it seems we have a pretty good balanced solution; small
> object-header + relatively fast access..
>
>
>
> On Tue, Mar 30, 2010 at 6:55 PM, Levente Uzonyi<leves at elte.hu>  wrote:
>    
>> On Tue, 30 Mar 2010, Andres Valloud wrote:
>>
>>      
>>> What is the purpose of adding an id to so many objects?  Is this a real
>>> application problem?  Can you be more specific as to the context?
>>>        
>> AFAIK Magma uses it to assign unique ids to objects.
>>
>>
>> Levente
>>
>>      
>>> On 3/30/10 9:21 , Levente Uzonyi wrote:
>>>        
>>>> On Tue, 30 Mar 2010, Andres Valloud wrote:
>>>>
>>>>
>>>>          
>>>>> I mentioned implementing identityHash as hash as an example given.  I
>>>>> also
>>>>> think I mentioned the rule x == y =>    x identityHash = y identityHash,
>>>>> so I
>>>>> hope it's clear that one shouldn't just blindly move ahead in these
>>>>> matters.
>>>>> The point is that nothing prevents anybody from adding an instance
>>>>> variable
>>>>> called identityHash to their objects, storing arbitrary (but well
>>>>> chosen!)
>>>>> small integers in said instance variable, and then having the
>>>>> identityHash
>>>>> message just answer said values.  If you do this for the cases in which
>>>>> you
>>>>> have significantly more than 4096 objects, then you only pay the price
>>>>> to
>>>>> hold better identityHash values for the objects that need them (as
>>>>> opposed to
>>>>> every single object in the image when the header is expanded instead).
>>>>>   Or
>>>>> perhaps you don't really need identity for the cases being discussed in
>>>>> practice, and just using hash as identityHash is fine.  It's hard to
>>>>> tell
>>>>> without concrete examples.  One way or the other, I do not think the
>>>>> size of
>>>>> the identityHash field *must* result in poor hashed collection
>>>>> performance.
>>>>> Such an implication does not follow.
>>>>>
>>>>>            
>>>> The original problem is: pick 1000000 or more objects and associate an id
>>>> to them.
>>>> - Using #hash and #== doesn't work help, because the objects state may
>>>> change.
>>>> - Adding a slot to every object can't be done, because the class of the
>>>> objects can be anything. Adding a slot to Object would break lots of
>>>> code.
>>>> - Using #largeHash (12 bits from the object + 12 bits from the object's
>>>> class) doesn't help, because there may be only a few different classes.
>>>>
>>>>
>>>> Levente
>>>>
>>>>
>>>>          
>>>>> On 3/30/10 1:48 , Andreas Raab wrote:
>>>>>
>>>>>            
>>>>>> On 3/30/2010 1:09 AM, Andres Valloud wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Right, so provide a better identityHash implementation in the image
>>>>>>> (e.g.: implement hash and then have identityHash call hash instead),
>>>>>>> and
>>>>>>> problem solved...
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Except that #hash is not constant over the lifetime of most objects but
>>>>>> #identityHash is. So if you have a property associated with an object
>>>>>> in
>>>>>> a IDDict and the #hash depends on a value of a variable it may change
>>>>>> over the lifetime of the object and your key gets invalid.
>>>>>>
>>>>>> Cheers,
>>>>>>      - Andreas
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> On 3/26/10 3:37 , Levente Uzonyi wrote:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> On Thu, 25 Mar 2010, Andres Valloud wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> If lookups find the sought object in mostly one attempt, the
>>>>>>>>> primitive is
>>>>>>>>> overkill... most of the time, the real issue is the quality of the
>>>>>>>>> hash
>>>>>>>>> function.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> That's true, but this is not the case with the 4096 hash values.
>>>>>>>>
>>>>>>>>
>>>>>>>> Levente
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> On 3/25/10 1:27 , Levente Uzonyi wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>> On Thu, 25 Mar 2010, Igor Stasenko wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> i think that #pointsTo: is a cheat :), which you can use in Sets
>>>>>>>>>>> but
>>>>>>>>>>> not dictionaries, because
>>>>>>>>>>> it contains associations. Also, it works only for identity-based
>>>>>>>>>>> collections.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> Dictionaries don't have to use associations (for example
>>>>>>>>>> MethodDictionary
>>>>>>>>>> doesn't use them), that's why #pointsTo: works (MethodDictionary
>>>>>>>>>> also
>>>>>>>>>> uses it).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>>> I wonder how LargeIdentityDictionary compares to your
>>>>>>>>>>>> dictionaries'.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>> me too.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>> If you give me a pointer to the source code, I can run the
>>>>>>>>>> benchmarks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Levente
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>                    
>>>>>>>>                  
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>> .
>>>>
>>>>
>>>>          
>>>
>>>        
>>
>>      



More information about the Squeak-dev mailing list