[squeak-dev] The Inbox: Collections-cmm.870.mcz
Chris Muller
ma.chris.m at gmail.com
Sat Jan 4 21:53:43 UTC 2020
Thanks again Levente, once again, I somehow got my brain stuck :) and
thought "Dictionary new" was the same as "Dictionary new: 7", but, in fact,
it's the same as "Dictionary new: 3". Again, I conflated the desired size
with the internal array's size needing extra space, sigh... It was this
finally cleared it up for me:
(1 to: 10) collect: [ : n | Dictionary sizeFor: n ]. " #(5 5 5 7 11
11 11 13 13 17)"
So, back to my "problem", it looks like I'm safe to write:
Dictionary new: runtimeVar
and, no matter what size, there will never be a performance hit over
Dictionary new. Whew!
Thanks,
Chris
On Fri, Jan 3, 2020 at 9:25 PM Levente Uzonyi <leves at caesar.elte.hu> wrote:
> Hi Chris,
>
> The problem with that change is that #sizeFor: returns incorrect values
> when its argument is 4 or 5. The change breaks #compact, and the contract
> of #new: and #sizeFor:, because when you want to store 4 or 5 elements,
> you'll get a HashedCollection which can only store 3 elements, so there'll
> either be an extra grow involved or the collection will be overfilled.
>
> Here's an example where the change makes #compact break the contract of
> HashedCollecions:
>
> (1 to: 5) asSet compact; includes: 6 "#includes: will raise an
> error"
>
> The default HashedCollection instances created with #new can hold 3
> elements without growing.
> If you need a fast way to create a Dictionary that can hold 5 elements
> without growing, then you can add a new method and use that in your code:
>
> HashedCollection class >> #newFor5
>
> ^self basicNew initialize: 7
>
> This way the method guarantees the capacity is a good prime just like
> #new does. And it won't affect anything else.
>
> Or we could change #new: to create a larger array by default, but then we
> would use more memory. In my image 68.7% of all the HashedCollections have
> the default capacity: 5. And 69% of all HashedCollections have fewer than
> 3 elements, so those could all be shrunk down to capacity 3, saving 232kB
> memory (64-bit image).
> If we changed the default capacity to 7, the size of the image would grow
> by 231kB, which is not much nowadays, but still something.
>
> Perhaps we could create a class variable to hold the default capacity:
>
> HashedCollection class >> #new
>
> ^self basicNew initialize: DefaultCapacity
>
> And then you could change it to 7 or 11 in your images.
>
>
> Levente
>
> On Fri, 3 Jan 2020, Chris Muller wrote:
>
> > Hi Levente,
> > Due to ancestry, the diff of this over Collections-ul.867 is only this:
> >
> > + nElements < 6 ifTrue: [ ^5 ].
> >
> > The issue I'm facing is, I don't know the size of the Dictionary at
> compile time, but I do at runtime.
> >
> > Dictionary new: runtimeVar
> >
> > However, I don't know what the runtimeVar values will be. It could be
> run millions of times with a value less than the default size given by
> "Dictionary new", or greater. The implementation is forcing me to optimize
> for one
> > or the other at compile time, but I can't. As developer, I'm "torn"
> whether to write:
> >
> > Dictionary new
> > Dictionary new: runtimeVar (44% speed of Dictionary new when
> runtimeVar < 6)
> > or (runtimeVar < 6) ifTrue: [ Dictionary new ] ifFalse: [ Dictionary
> new: runtimeVar ]
> >
> > That conflict is what this last tweak tries to solve. It improves the
> second case, above, to 89% of "Dictionary new".
> >
> > Everything about this looks good and would like to include it in 5.3.
> >
> > - Chris
> >
> > On Fri, Jan 3, 2020 at 7:59 PM <commits at source.squeak.org> wrote:
> > Chris Muller uploaded a new version of Collections to project The
> Inbox:
> > http://source.squeak.org/inbox/Collections-cmm.870.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Collections-cmm.870
> > Author: cmm
> > Time: 3 January 2020, 7:59:19.257837 pm
> > UUID: b3e2efab-cf35-426d-9910-adab44aa4ed4
> > Ancestors: Collections-fn.869, Collections-ul.867
> >
> > Building on Collections-ul.867, let "Dictionary new: anySize"
> _never_ be significantly slower than "Dictionary new".
> >
> > =============== Diff against Collections-fn.869 ===============
> >
> > Item was changed:
> > ----- Method: HashedCollection class>>goodPrimeAtLeast: (in
> category 'sizing') -----
> > goodPrimeAtLeast: lowerLimit
> > + "Answer the smallest good prime >= lowerlimit.
> > + If lowerLimit is larger than the largest known good prime,
> just make it odd.
> > + Use linear search, and exponential search to speed up
> cases when lowerLimit is small (<2500 and <100000, respectively)."
> > - "Answer the next good prime >= lowerlimit.
> > - If lowerLimit is larger than the largest known good prime,
> > - just make it odd."
> >
> > + | high mid low prime primes |
> > - | low mid high prime primes |
> > primes := self goodPrimes.
> > + (primes at: primes size) < lowerLimit ifTrue: [
> > - high := primes size.
> > - lowerLimit > (primes at: high) ifTrue: [
> > ^lowerLimit bitOr: 1 ].
> > + lowerLimit < 2500 ifTrue: [
> > + "Use linear search when the limit is small. The
> boundary is based on measurements."
> > + high := 1.
> > + [
> > + (prime := primes at: high) >= lowerLimit
> ifTrue: [ ^prime ].
> > + high := high + 1 ] repeat ].
> > + lowerLimit < 100000
> > + ifTrue: [
> > + "Use exponential search when the limit is
> not too large. The boundary is based on measurements."
> > + high := 1.
> > + [ (primes at: high) < lowerLimit ]
> whileTrue: [
> > + high := high * 2 ].
> > + low := high // 2 + 1. "high // 2 was
> smaller than lowerLimit" ]
> > + ifFalse: [
> > + "Regular binary search."
> > + low := 1.
> > + high := primes size ].
> > - low := 1.
> > [ high - low <= 1 ] whileFalse: [
> > mid := high + low // 2.
> > prime := primes at: mid.
> > lowerLimit < prime
> > ifTrue: [ high := mid ]
> > ifFalse: [
> > lowerLimit > prime
> > ifTrue: [ low := mid ]
> > ifFalse: [ ^prime ] ] ].
> > (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
> > ^primes at: high!
> >
> > Item was changed:
> > ----- Method: HashedCollection class>>sizeFor: (in category
> 'sizing') -----
> > sizeFor: nElements
> > "Large enough prime (or odd if too large) size to hold
> nElements with some slop (see fullCheck)"
> >
> > + nElements < 6 ifTrue: [ ^5 ].
> > - nElements < 4 ifTrue: [ ^5 ].
> > ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!
> >
> > Item was changed:
> > ----- Method: HashedCollection>>compact (in category 'private')
> -----
> > compact
> > "Reduce the size of array so that the load factor will be
> ~75%."
> >
> > | newCapacity |
> > + newCapacity := self class sizeFor: tally.
> > - newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
> > self growTo: newCapacity!
> >
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200104/bae6e82b/attachment.html>
More information about the Squeak-dev
mailing list
|