[squeak-dev] The Trunk: Collections-cmm.1016.mcz

Wed Jul 13 21:00:55 UTC 2022

Hi Chris,

On Wed, 13 Jul 2022, Chris Muller wrote:

> Hi Christoph,
> 
> Thanks for the review and excellent suggestions.  Please see Collections-cmm.1019, and let me know if you see anything else.
>
>       Regarding the implementation: Did you run any benchmarks and how massive is the slowdown? The previous implementation used #basicAt: to avoid comparing characters (which is not fast on all platforms) and sending
>       messages to them (in favor of inlining). I'm curious whether this could be avoided in the new implementation as well and how much performance could be won with that. #numberOrValue also looks very slow. Maybe it
>       would be worth providing two alternative branches depending on the type of the collection argument?
> 
> I tend to weigh toward a system defined tersely in terms of its own messages, and letting performance emanate from the _design_, as opposed to chasing an extra 5% of execution performance improvement at the expense of
> expressivity of the code.  If the point of that 5% is to "save time", it seems reasonable to consider the time of future readers of the code.
> 
> The method, #isAlphanumeric, is a prime example.  Originally, its implementation beautifully matched its definition.
> _____
>     isAlphaNumeric
>         "Answer whether the receiver is a letter or a digit."
>         ^self isLetter or: [self isDigit]
> _____
> 
> Compare that to now, a complex, copy-and-pasted "implementation" which is a lot harder to understand and maintain, but only 10% faster in execution.  IMO, that seems past the point of diminishing returns of what a user of
> Smalltalk would expect.

If #isAlphaNumeric is too complex, then so are #isLetter and #isDigit. 
Please revert those as well to the _simple_ implementation and redo your 
benchmark.

> 
> Having said that, Squeak's speed is sweet, I can appreciate the desire to hyper-optimize at the bytecode level.  Here's Marcel's benchmark with the latest:

Wasn't it you who considered the weak dictionaries too slow and had to 
use a different implementation?

> ___
> ['Hello {1}!' format: { 'Squeak' }] bench.
> 
>  '3,450,000 per second. 290 nanoseconds per run. 1.35946 % GC time.'   <--- new
>  '3,820,000 per second. 262 nanoseconds per run. 4.22 % GC time.'     <--- old
> 
> 3450.0/3820   0.9031413612565445 

What if there are multiple substitutions instead of just one? Is it still 
just 10 percent slower?

> ___
> 
> Looks like about a 10% hit for this example.  Maybe it could be improved, but I doubt by very much.  Unfortunately using #basicAt: isn't convenient when alphanumeric tokens are possible.  

Why limit the tokens to alphanumeric ones? Why am I not allowed to write 
the following?

 	'{foo_bar}' format: ({ 'foo_bar' -> 1 })

Also, why do I get an error when I try this?

 	'{0x1}{0x2}' format: ({ '0x1' -> 1. '0x2' -> 2 } as: Dictionary)

Levente

> 
> If this is still too much, let me know, I'll take it back to the original numerals-only version.
> 
> Best,
>   Chris
> 
>