[squeak-dev] The Inbox: Collections-cmm.870.mcz

Chris Muller ma.chris.m at gmail.com
Sat Jan 4 21:53:43 UTC 2020


Thanks again Levente, once again, I somehow got my brain stuck :) and
thought "Dictionary new" was the same as "Dictionary new: 7", but, in fact,
it's the same as "Dictionary new: 3".  Again, I conflated the desired size
with the internal array's size needing extra space, sigh...   It was this
finally cleared it up for me:

   (1 to: 10) collect: [ : n | Dictionary sizeFor: n ].     " #(5 5 5 7 11
11 11 13 13 17)"

So, back to my "problem", it looks like I'm safe to write:

      Dictionary new: runtimeVar

and, no matter what size, there will never be a performance hit over
Dictionary new.  Whew!

Thanks,
  Chris

On Fri, Jan 3, 2020 at 9:25 PM Levente Uzonyi <leves at caesar.elte.hu> wrote:

> Hi Chris,
>
> The problem with that change is that #sizeFor: returns incorrect values
> when its argument is 4 or 5. The change breaks #compact, and the contract
> of #new: and #sizeFor:, because when you want to store 4 or 5 elements,
> you'll get a HashedCollection which can only store 3 elements, so there'll
> either be an extra grow involved or the collection will be overfilled.
>
> Here's an example where the change makes #compact break the contract of
> HashedCollecions:
>
>         (1 to: 5) asSet compact; includes: 6 "#includes: will raise an
> error"
>
> The default HashedCollection instances created with #new can hold 3
> elements without growing.
> If you need a fast way to create a Dictionary that can hold 5 elements
> without growing, then you can add a new method and use that in your code:
>
> HashedCollection class >> #newFor5
>
>         ^self basicNew initialize: 7
>
> This way the method guarantees the capacity is a good prime just like
> #new does. And it won't affect anything else.
>
> Or we could change #new: to create a larger array by default, but then we
> would use more memory. In my image 68.7% of all the HashedCollections have
> the default capacity: 5. And 69% of all HashedCollections have fewer than
> 3 elements, so those could all be shrunk down to capacity 3, saving 232kB
> memory (64-bit image).
> If we changed the default capacity to 7, the size of the image would grow
> by 231kB, which is not much nowadays, but still something.
>
> Perhaps we could create a class variable to hold the default capacity:
>
> HashedCollection class >> #new
>
>         ^self basicNew initialize: DefaultCapacity
>
> And then you could change it to 7 or 11 in your images.
>
>
> Levente
>
> On Fri, 3 Jan 2020, Chris Muller wrote:
>
> > Hi Levente,
> > Due to ancestry, the diff of this over Collections-ul.867 is only this:
> >
> >    +       nElements < 6 ifTrue: [ ^5 ].
> >
> > The issue I'm facing is, I don't know the size of the Dictionary at
> compile time, but I do at runtime.
> >
> >     Dictionary new: runtimeVar
> >
> > However, I don't know what the runtimeVar values will be.  It could be
> run millions of times with a value less than the default size given by
> "Dictionary new", or greater.  The implementation is forcing me to optimize
> for one
> > or the other at compile time, but I can't.  As developer, I'm "torn"
> whether to write:
> >
> >        Dictionary new
> >        Dictionary new: runtimeVar    (44% speed of Dictionary new when
> runtimeVar < 6)
> > or    (runtimeVar < 6) ifTrue: [ Dictionary new ] ifFalse: [ Dictionary
> new: runtimeVar ]
> >
> > That conflict is what this last tweak tries to solve.  It improves the
> second case, above, to 89% of "Dictionary new".
> >
> > Everything about this looks good and would like to include it in 5.3.
> >
> >  - Chris
> >
> > On Fri, Jan 3, 2020 at 7:59 PM <commits at source.squeak.org> wrote:
> >       Chris Muller uploaded a new version of Collections to project The
> Inbox:
> >       http://source.squeak.org/inbox/Collections-cmm.870.mcz
> >
> >       ==================== Summary ====================
> >
> >       Name: Collections-cmm.870
> >       Author: cmm
> >       Time: 3 January 2020, 7:59:19.257837 pm
> >       UUID: b3e2efab-cf35-426d-9910-adab44aa4ed4
> >       Ancestors: Collections-fn.869, Collections-ul.867
> >
> >       Building on Collections-ul.867, let "Dictionary new: anySize"
> _never_ be significantly slower than "Dictionary new".
> >
> >       =============== Diff against Collections-fn.869 ===============
> >
> >       Item was changed:
> >         ----- Method: HashedCollection class>>goodPrimeAtLeast: (in
> category 'sizing') -----
> >         goodPrimeAtLeast: lowerLimit
> >       +       "Answer the smallest good prime >= lowerlimit.
> >       +       If lowerLimit is larger than the largest known good prime,
> just make it odd.
> >       +       Use linear search, and exponential search to speed up
> cases when lowerLimit is small (<2500 and <100000, respectively)."
> >       -       "Answer the next good prime >= lowerlimit.
> >       -       If lowerLimit is larger than the largest known good prime,
> >       -       just make it odd."
> >
> >       +       | high mid low prime primes |
> >       -       | low mid high prime primes |
> >               primes := self goodPrimes.
> >       +       (primes at: primes size) < lowerLimit ifTrue: [
> >       -       high := primes size.
> >       -       lowerLimit > (primes at: high) ifTrue: [
> >                       ^lowerLimit bitOr: 1 ].
> >       +       lowerLimit < 2500 ifTrue: [
> >       +               "Use linear search when the limit is small. The
> boundary is based on measurements."
> >       +               high := 1.
> >       +               [
> >       +                       (prime := primes at: high) >= lowerLimit
> ifTrue: [ ^prime ].
> >       +                       high := high + 1 ] repeat ].
> >       +       lowerLimit < 100000
> >       +               ifTrue: [
> >       +                       "Use exponential search when the limit is
> not too large. The boundary is based on measurements."
> >       +                       high := 1.
> >       +                       [ (primes at: high) < lowerLimit ]
> whileTrue: [
> >       +                               high := high * 2 ].
> >       +                       low := high // 2 + 1. "high // 2 was
> smaller than lowerLimit" ]
> >       +               ifFalse: [
> >       +                       "Regular binary search."
> >       +                       low := 1.
> >       +                       high := primes size ].
> >       -       low := 1.
> >               [ high - low <= 1 ] whileFalse: [
> >                       mid := high + low // 2.
> >                       prime := primes at: mid.
> >                       lowerLimit < prime
> >                               ifTrue: [ high := mid ]
> >                               ifFalse: [
> >                                       lowerLimit > prime
> >                                               ifTrue: [ low := mid ]
> >                                               ifFalse: [ ^prime ] ] ].
> >               (primes at: low) >= lowerLimit ifTrue: [ ^primes at: low ].
> >               ^primes at: high!
> >
> >       Item was changed:
> >         ----- Method: HashedCollection class>>sizeFor: (in category
> 'sizing') -----
> >         sizeFor: nElements
> >               "Large enough prime (or odd if too large) size to hold
> nElements with some slop (see fullCheck)"
> >
> >       +       nElements < 6 ifTrue: [ ^5 ].
> >       -       nElements < 4 ifTrue: [ ^5 ].
> >               ^self goodPrimeAtLeast: nElements + 1 * 4 // 3!
> >
> >       Item was changed:
> >         ----- Method: HashedCollection>>compact (in category 'private')
> -----
> >         compact
> >               "Reduce the size of array so that the load factor will be
> ~75%."
> >
> >               | newCapacity |
> >       +       newCapacity := self class sizeFor: tally.
> >       -       newCapacity := self class goodPrimeAtLeast: tally * 4 // 3.
> >               self growTo: newCapacity!
> >
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200104/bae6e82b/attachment.html>


More information about the Squeak-dev mailing list