[squeak-dev] The Trunk: Collections-eem.792.mcz

Eliot Miranda eliot.miranda at gmail.com
Mon May 7 20:35:02 UTC 2018


tangentally...

On Mon, May 7, 2018 at 1:02 PM, Chris Muller <ma.chris.m at gmail.com> wrote:

> Hi Levente,
>
>
>> meant.  My own criteria is "methods which are called thousands of times
>>> per minute" _today_.
>>>
>>
>
>> A rule of thumb could be that if a core method (e.g. anything in
>> Kernel/Collections) is part of an API and it's not there to support a very
>> specific task (e.g. Dictionary >> #associationDeclareAt:), then its
>> performance should be taken into account.
>>
>
> That sounds like it only exempts utility methods ("specific tasks").
> Technically, this should leave me afraid that you might want to inline
> Dictionary>>#at:, but I'm relatively sure you don't.  #at: is the most
> basic featured accessing method for understanding and using a Dictionary,
> so I hope you agree that's one that should retain its exemplar status.
>

But Chris, Dictionary>>#at: has already been heavily optimized via
scanFor:.  Here's the original Smalltalk-80 v2 code:

*Dictionary methods for accessing*
*at:* key
"Answer the value at key.  If key is not found, create an error message."

^self at: key ifAbsent: [self errorKeyNotFound]

*at:* key *ifAbsent:* aBlock
"Answer the value at key.  If key is not found, answer the
result of evaluating aBlock."

| index |
index _ self findKey: key ifAbsent: [^aBlock value].
^(self basicAt: index) value

*Dictionary methods for private*
*findKey:* key *ifAbsent:* aBlock
| index |
index _ self findKeyOrNil: key.
(self basicAt: index) == nil ifTrue: [^aBlock value].
^index

*findKeyOrNil:* key
| location length probe pass |
length _ self basicSize.
pass _ 1.
location _ key hash \\ length + 1.
[(probe _ self basicAt: location) == nil or: [probe key = key]]
whileFalse:
[(location _ location + 1) > length
ifTrue:
[location _ 1.
pass _ pass + 1.
pass > 2 ifTrue: [^self grow findKeyOrNil: key]]].
^location

Here's the current Squeak code

*at:* key
"Answer the value associated with the key."

^ self at: key ifAbsent: [self errorKeyNotFound: key]

*at:* key *ifAbsent:* aBlock
"Answer the value associated with the key or, if key isn't found,
answer the result of evaluating aBlock."

^((array at: (self scanFor: key)) ifNil: [ aBlock ]) value "Blocks and
Associations expect #value"

Dictionary methods for private
*scanFor:* anObject
"Scan the key array for the first slot containing either a nil (indicating
an empty slot) or an element that matches anObject. Answer the index of
that slot or raise an error if no slot is found. This method will be
overridden in various subclasses that have different interpretations for
matching elements."

| index start size |
index := start := anObject hash \\ (size := array size) + 1.
[
| element |
((element := array at: index) == nil or: [ anObject = element key ])
ifTrue: [ ^index ].
(index := index \\ size + 1) = start ] whileFalse.
self errorNoFreeSpace

In the v2.0 code, at: always creates a block, at:ifAbsent: always creates a
block (which means two block activations on error), and the scan for
elements is three activations deep.

In Squeak, at: always creates a block, but at:ifAbsent: never creates a
block (which means one block activation on error), and the scan is two
activations deep.  Note also that findKeyOrNil: wraps around, making more
than a single pass if it can't find an element whereas scanFor: makes at
most a single pass.

 The Squeak code is much better.  Has the optimization made the code any
the less reusable or elegant?  I find it no less reusable and more elegant.
 scanFor: is a better thought-out kernel than findKeyOrNil:.  So
optimization doesn't always make things less readable, reusable or elegant.

OTOH, #associationDeclareAt: and #isOctetString, by their nature, are only
> of interest to developers.  Users would never use those methods, so
> optimizing them would be fine I suppose.  Methods which take block
> arguments also tend to be not as usable by users, since passing a block
> requres writing some code.  So it seems we have _some_ concrete criteria
> starting to form simply by having looked at a few examples:
>
>    - methods which run hundreds of times per minute
>    - methods which take block as arguments
>    - methods which are concerned with internals instead of externals
>
> #isAsciiString seems like it could go either way, so I'm fine if you and
> Eliot prefer it to be optimized.  For me this discussion itself to get a
> slightly better understanding of each others' positions with the request to
> keep Dynabook users in mind when evaluating future such optimizations, is
> more important.
>
> Regards,
>   Chris
>
> On Fri, May 4, 2018 at 2:43 PM, Levente Uzonyi <leves at caesar.elte.hu>
> wrote:
>
>> On Thu, 3 May 2018, Chris Muller wrote:
>>
>>                   On a serious note, libary methods, like that method,
>>> ought to be fast,
>>>
>>>
>>>             Every method in the image is a library method, so what do
>>> you mean by
>>>
>>>
>>>       Really? Explain me how Browser >> #defaultBrowserTitle, a method I
>>> chose randomly, is a library method?
>>>
>>>
>>> Well, I was actually asking for clarification from _you_  for the
>>> meaning of "library method", since you were the one assigning an "ought" to
>>> them  :).  So I started out with it meaning, "every method in the base
>>> image", as a means to open up the question of what you
>>> meant.  My own criteria is "methods which are called thousands of times
>>> per minute" _today_.  But as the guy whose done more of these sorts of
>>> method optimizations than anyone (most of which I love), I am actually
>>> *truly curious* about YOUR criteria on this.  What
>>> methods in the image would Levente prefer to be more readable than
>>> performant, and why?
>>>
>>
>> I can't give you an exact definition of a "library method", but I'm sure
>> that String >> #isAsciiString is one, while Browser >> #defaultBrowserTitle
>> is not.
>> A rule of thumb could be that if a core method (e.g. anything in
>> Kernel/Collections) is part of an API and it's not there to support a very
>> specific task (e.g. Dictionary >> #associationDeclareAt:), then its
>> performance should be taken into account.
>>
>> Btw, please have a look at String >> #isOctetString.
>>
>> Levente
>>
>>
>>
>>>
>>>
>>> > ....
>>>
>>>
>>>
>>>                   because you can't know in what situation they will be
>>> used. Saying that it's
>>>                   not even used is therefore not a valid reason.
>>>
>>>
>>>             That's only true for the very lowest-level methods.
>>> Generally, it's
>>>             better to design and code for what's known, and NOT try to
>>> code for
>>>             the future / unknown.  IF and when that future comes, it'd
>>> be easy
>>>             enough for you to deal with any performance issues at THAT
>>> time.  In
>>>             the meantime, we "ought"  :) to keep as much beautiful
>>> minimal
>>>             elegance as we can.
>>>
>>>
>>>       So, if I decided to write a mail transfer agent, and I found that
>>> optimizing #isAsciiString would bump the throughput by, let's say, 10%,
>>> then and only then would I be allowed to change the implementation in
>>> Squeak? Or should I keep my optimized version in
>>>       my image, because it's not considered generally useful?
>>>
>>>
>>> Yes.  That's when you'll have the concrete context necessary to enable
>>> you to make the best implementation choice.  Until then, degrading code
>>> readability in exchange for no real-world performance benefit just... well,
>>> degrades the code...
>>>
>>
>
>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180507/a449bbbc/attachment.html>


More information about the Squeak-dev mailing list