On Fri, 16 Dec 2011, Henrik Sperre Johansen wrote:
On 16.12.2011 03:26, Levente Uzonyi wrote:
How about my numbers? :)
"Preallocate objects, so we won't count gc time."
n := 1000000.
objects := Array new: n streamContents: [ :stream |
n timesRepeat: [ stream nextPut: Object new ] ].
set := IdentitySet new: n.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "4949"
set := LargeIdentitySet new.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "331"
set := (PluggableSet new: n)
hashBlock: [ :object | object identityHash * 4096 + object class
identityHash * 64 ]; "Change this to #basicIdentityHash in Pharo"
equalBlock: [ :a :b | a == b ];
yourself.
Smalltalk garbageCollect.
[1 to: n do: [ :i | set add: (objects at: i) ] ] timeToRun. "5511"
I also have a LargeIdentityDictionary, which is relatively fast, but not
as fast as LargeIdentitySet, because (for some unknown reason) we don't
have a primitive that could support it. If we had a primitive like
primitive 132 which would return the index of the element if found or 0 if
not, then we could have a really fast LargeIdentityDictionary.
Levente
Hehe yes, if writing a version fully exploiting the limited range, that's
probably the approach I would go for as well.
(IAssuming it's the version at http://leves.web.elte.hu/**
squeak/LargeIdentitySet.st<http://leves.web.elte.hu/squeak/LargeIdentitySet.st>
)
Mariano commented in the version at http://www.squeaksource.com/**
FuelExperiments <http://www.squeaksource.com/FuelExperiments> that it's
slow for them, which I guess is due to not adopting #identityHash calls to
#basicIdentityHash calls for Pharo:
((0 to: 4095) collect: [:each | each << 22 \\ 4096 ]) asSet size -> 1
So it basically uses 1 bucket instead of 4096... Whoops. :)
Uploaded a new version to the MC repository which is adapted for Pharo,
on the same machine my numbers were taken from, it does the same test as I
used above in 871 ms. (181 with preallocation).
Cool. One more thing: in Squeak the method using primitive 132 directly
was renamed to #instVarsInclude:, so now #pointsTo: works as expected. If
this was also added to Pharo, then the #pointsTo: sends should be changed
to #instVarsInclude:, otherwise Array can be reported as included even if
it wasn't added.
I'll upload my LargeIdentityDictionary implementation to the same place
this evening, since it's still 2-3 factor faster than other solutionts and
there seem to be demand for it.
Levente
Cheers,
Henry