[squeak-dev] Re: 4.1 - hashed collections still a problem

Chris Muller asqueaker at gmail.com
Thu Apr 1 17:26:33 UTC 2010


Magma oids are allocated by the server, they're consecutive integers.
identityHash has nothing to do with that.

Andre wrote:

> The point is that nothing prevents anybody from adding an instance
> variable called identityHash to their objects, storing arbitrary (but
> well chosen!) small integers in said instance variable, and then...

It has to be for any object; Morphs, Assocaitions, Sets, Strings,
LargeIntegers, etc., not just my own application objects.

You make it sound easy; but the other thing to remember is the
importance of doing it very *quickly*, because if #identityHash is
long to initialize or calculate, then you lose speed that way.

So it seems we have a pretty good balanced solution; small
object-header + relatively fast access..



On Tue, Mar 30, 2010 at 6:55 PM, Levente Uzonyi <leves at elte.hu> wrote:
> On Tue, 30 Mar 2010, Andres Valloud wrote:
>
>> What is the purpose of adding an id to so many objects?  Is this a real
>> application problem?  Can you be more specific as to the context?
>
> AFAIK Magma uses it to assign unique ids to objects.
>
>
> Levente
>
>>
>> On 3/30/10 9:21 , Levente Uzonyi wrote:
>>>
>>> On Tue, 30 Mar 2010, Andres Valloud wrote:
>>>
>>>
>>>> I mentioned implementing identityHash as hash as an example given.  I
>>>> also
>>>> think I mentioned the rule x == y =>  x identityHash = y identityHash,
>>>> so I
>>>> hope it's clear that one shouldn't just blindly move ahead in these
>>>> matters.
>>>> The point is that nothing prevents anybody from adding an instance
>>>> variable
>>>> called identityHash to their objects, storing arbitrary (but well
>>>> chosen!)
>>>> small integers in said instance variable, and then having the
>>>> identityHash
>>>> message just answer said values.  If you do this for the cases in which
>>>> you
>>>> have significantly more than 4096 objects, then you only pay the price
>>>> to
>>>> hold better identityHash values for the objects that need them (as
>>>> opposed to
>>>> every single object in the image when the header is expanded instead).
>>>>  Or
>>>> perhaps you don't really need identity for the cases being discussed in
>>>> practice, and just using hash as identityHash is fine.  It's hard to
>>>> tell
>>>> without concrete examples.  One way or the other, I do not think the
>>>> size of
>>>> the identityHash field *must* result in poor hashed collection
>>>> performance.
>>>> Such an implication does not follow.
>>>>
>>> The original problem is: pick 1000000 or more objects and associate an id
>>> to them.
>>> - Using #hash and #== doesn't work help, because the objects state may
>>> change.
>>> - Adding a slot to every object can't be done, because the class of the
>>> objects can be anything. Adding a slot to Object would break lots of
>>> code.
>>> - Using #largeHash (12 bits from the object + 12 bits from the object's
>>> class) doesn't help, because there may be only a few different classes.
>>>
>>>
>>> Levente
>>>
>>>
>>>> On 3/30/10 1:48 , Andreas Raab wrote:
>>>>
>>>>> On 3/30/2010 1:09 AM, Andres Valloud wrote:
>>>>>
>>>>>
>>>>>> Right, so provide a better identityHash implementation in the image
>>>>>> (e.g.: implement hash and then have identityHash call hash instead),
>>>>>> and
>>>>>> problem solved...
>>>>>>
>>>>>>
>>>>> Except that #hash is not constant over the lifetime of most objects but
>>>>> #identityHash is. So if you have a property associated with an object
>>>>> in
>>>>> a IDDict and the #hash depends on a value of a variable it may change
>>>>> over the lifetime of the object and your key gets invalid.
>>>>>
>>>>> Cheers,
>>>>>     - Andreas
>>>>>
>>>>>
>>>>>
>>>>>> On 3/26/10 3:37 , Levente Uzonyi wrote:
>>>>>>
>>>>>>
>>>>>>> On Thu, 25 Mar 2010, Andres Valloud wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> If lookups find the sought object in mostly one attempt, the
>>>>>>>> primitive is
>>>>>>>> overkill... most of the time, the real issue is the quality of the
>>>>>>>> hash
>>>>>>>> function.
>>>>>>>>
>>>>>>>>
>>>>>>> That's true, but this is not the case with the 4096 hash values.
>>>>>>>
>>>>>>>
>>>>>>> Levente
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 3/25/10 1:27 , Levente Uzonyi wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Thu, 25 Mar 2010, Igor Stasenko wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> i think that #pointsTo: is a cheat :), which you can use in Sets
>>>>>>>>>> but
>>>>>>>>>> not dictionaries, because
>>>>>>>>>> it contains associations. Also, it works only for identity-based
>>>>>>>>>> collections.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Dictionaries don't have to use associations (for example
>>>>>>>>> MethodDictionary
>>>>>>>>> doesn't use them), that's why #pointsTo: works (MethodDictionary
>>>>>>>>> also
>>>>>>>>> uses it).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> I wonder how LargeIdentityDictionary compares to your
>>>>>>>>>>> dictionaries'.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> me too.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> If you give me a pointer to the source code, I can run the
>>>>>>>>> benchmarks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Levente
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> .
>>>
>>>
>>
>>
>
>



More information about the Squeak-dev mailing list