[squeak-dev] The Inbox: Collections-ul.844.mcz

Levente Uzonyi leves at caesar.elte.hu
Fri Jul 19 22:49:35 UTC 2019


On Fri, 19 Jul 2019, Tobias Pape wrote:

>
>> On 19.07.2019, at 15:54, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>> 
>> Hi Tobias,
>> 
>> On Fri, 19 Jul 2019, Tobias wrote:
>> 
>>> 
>>> Hi Levente
>>> 
>>> On Thu, 2019-07-18 at 22:13 +0000, commits at source.squeak.org wrote:
>>>> Levente Uzonyi uploaded a new version of Collections to project The
>>>> Inbox:
>>>> http://source.squeak.org/inbox/Collections-ul.844.mcz
>>>> 
>>>> ==================== Summary ====================
>>>> 
>>>> Name: Collections-ul.844
>>>> Author: ul
>>>> Time: 19 July 2019, 12:08:51.94435 am
>>>> UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
>>>> Ancestors: Collections-mt.843
>>>> 
>>>> - added String >> #atOrNil: which uses primitive 63 and returns
>>>> either the character at the given index or nil when the primtiive
>>>> fails. This is a faster alternative to #at:ifAbsent: when the absent
>>>> block would yield nil.
>>>> 
>>> 
>>> Could you tell me the use case here?
>> 
>> ShoutCore-ul.66 in the Inbox.
>
> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?

Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends 
there?

Why does it matter what the frequency of the out-of-range access case is?

> If so, we should really tackle _that_ instead of making pouring out nils fast.

What's wrong with nil here?

>
>> 
>>> This sounds like a convoluted range check.
>> 
>> Yes, it's a way to avoid the range check being done by the image.
>> 
>>> I mean, yes it would be faster to do
>>> 
>>> knorz := 'foo'
>>> index := 32
>>> ^ knorz atOrNil: index
>>> 
>>> but I would rather see
>>> 
>>> knorz := 'foo'
>>> index := 32
>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>> 
>> That's not enough. You have to check the other end of the range as well.
>> And that's what this method tries to avoid.
>
> But its telling a different story.

What story is that?

> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?

Performance is the benefit. You see it, but you don't think you need it.

>
>
>> 
>>> 
>>> or more robust even
>>> 
>>> 
>>> knorz := 'foo'
>>> index := 32
>>> ^ knorz atPin: 32
>> 
>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>
> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.

So, how would you use #atPin: with Shout?

>
> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.

I'm sure many disagree with you on the boring part.
(IIRC the parser in Pharo replaced all three of those, but its 
performance is still worse than Parser's. I remember seeing numbers around 
3x slower, but that may have changed).

>
> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.

I take that (and the rest of your mail) as a -1 from you for 
#atOrNil:.

Levente

>
> Best regards
> 	-Tobias :)
>
>
>
> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>> 
>> Levente
>> 
>>> 
>>> (even though that changes the semantic)
>>> 
>>> So, can you tell me the use case here?
>>> Best regards
>>> 	-Tobias
>>> 
>>> PS: That said, I think using #at: for String in the first place will
>>> bite us down the road, whether we want to change the String
>>> representation at some point (ropes, immutables, values, ...) or just
>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>> or $'è'


More information about the Squeak-dev mailing list