[Vm-dev] Do we have the new primitive?? [WAS] Re: [Pharo-project] IdentitySet but using #hash rather than #identityHash ?

Mariano Martinez Peck marianopeck at gmail.com
Sat Feb 25 16:45:52 UTC 2012


> All I can say is that I am impressed by the numbers it is really much
>> faster.
>> I still don't understand why I send this email with a subject say
>> IdentitySet because what I really need is a fast/large IdentityDictionary
>> :(  Anyway, there's a place where we can use this LargeIdentitySet in Fuel
>> I think).
>>
>> So Levente, you say this is not possible to adapt this for dictionary?
>>  can
>> we contact Eliot to provide such a primitive?
>>
>
> As promised, I uploaded my LargeIdentityDictionary implementation to
> http://leves.web.elte.hu/**squeak/**LargeIdentityDictionary.st<http://leves.web.elte.hu/squeak/LargeIdentityDictionary.st>.
> The numbers will be a bit worse compared to LargeIdentitySet, because of
> the lack of the primitive, but it's still 2-3x faster than other solutions
> (IdentityDictionary, PluggableIdentityDictionary, subclassing, etc). I'm
> about to propose this primitive with other improvements on the vm-dev list.
>
>
Hi Eliot/Levente. What is the status of this? Do we have already the new
primitive? If true, how can we adapt LargeIdentitySet to use such new
primitive?

Thanks!








>
> Levente
>
>
>> thanks
>>
>> On Fri, Dec 16, 2011 at 3:28 PM, Levente Uzonyi <leves at elte.hu> wrote:
>>
>>  On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
>>>
>>>  On 16.12.2011 03:26, Levente Uzonyi wrote:
>>>
>>>>
>>>>
>>>>> How about my numbers? :)
>>>>>
>>>>> "Preallocate objects, so we won't count gc time."
>>>>> n := 1000000.
>>>>> objects := Array new: n streamContents: [ :stream |
>>>>>   n timesRepeat: [ stream nextPut: Object new ] ].
>>>>>
>>>>> set := IdentitySet new: n.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
>>>>>
>>>>> set := LargeIdentitySet new.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
>>>>>
>>>>> set := (PluggableSet new: n)
>>>>>   hashBlock: [ :object | object identityHash * 4096 + object class
>>>>> identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
>>>>>   equalBlock: [ :a :b | a == b ];
>>>>>   yourself.
>>>>> Smalltalk garbageCollect.
>>>>> [1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
>>>>>
>>>>>
>>>>> I also have a LargeIdentityDictionary, which is relatively fast, but
>>>>> not
>>>>> as fast as LargeIdentitySet, because (for some unknown reason) we don't
>>>>> have a primitive that could support it. If we had a primitive like
>>>>> primitive 132 which would return the index of the element if found or
>>>>> 0 if
>>>>> not, then we could have a really fast LargeIdentityDictionary.
>>>>>
>>>>>
>>>>> Levente
>>>>>
>>>>>  Hehe yes, if writing a version fully exploiting the limited range,
>>>> that's
>>>> probably the approach I would go for as well.
>>>> (IAssuming it's the version at http://leves.web.elte.hu/**
>>>> squeak/LargeIdentitySet.st<htt**p://leves.web.elte.hu/squeak/**
>>>> LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
>>>> >
>>>> )
>>>>
>>>> Mariano commented in the version at http://www.squeaksource.com/**
>>>> FuelExperiments <http://www.squeaksource.com/**FuelExperiments<http://www.squeaksource.com/FuelExperiments>>
>>>> that it's
>>>>
>>>> slow for them, which I guess is due to not adopting #identityHash calls
>>>> to
>>>> #basicIdentityHash calls for Pharo:
>>>> ((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
>>>> So it basically uses 1 bucket instead of 4096... Whoops. :)
>>>>
>>>> Uploaded a new version to the MC repository which is adapted for Pharo,
>>>> on the same machine my numbers were taken from, it does the same test
>>>> as I
>>>> used above in 871 ms. (181 with preallocation).
>>>>
>>>>
>>> Cool. One more thing: in Squeak the method using primitive 132 directly
>>> was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
>>> this was also added to Pharo, then the #pointsTo: sends should be changed
>>> to #instVarsInclude:, otherwise Array can be reported as included even if
>>> it wasn't added.
>>> I'll upload my LargeIdentityDictionary implementation to the same place
>>> this evening, since it's still 2-3 factor faster than other solutionts
>>> and
>>> there seem to be demand for it.
>>>
>>>
>>> Levente
>>>
>>>
>>>  Cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Mariano
>> http://marianopeck.wordpress.**com <http://marianopeck.wordpress.com>
>>
>>
>


-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120225/e5aa1bff/attachment.htm


More information about the Vm-dev mailing list