[squeak-dev] The Inbox: Collections-ul.844.mcz

Fri Jul 19 23:12:13 UTC 2019

Hi Levente :)

Thanks for bearing with me here..

> On 20.07.2019, at 00:49, Levente Uzonyi <leves at caesar.elte.hu> wrote:
> 
> On Fri, 19 Jul 2019, Tobias Pape wrote:
> 
>> 
>>> On 19.07.2019, at 15:54, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>>> Hi Tobias,
>>>>> […]
>>>> Could you tell me the use case here?
>>> ShoutCore-ul.66 in the Inbox.
>> 
>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
> 
> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?

Yes, and I wanted to convey that I disagree.

> 
> Why does it matter what the frequency of the out-of-range access case is?

Because optimizing it only makes sense if it is very frequent.

> 
>> If so, we should really tackle _that_ instead of making pouring out nils fast.
> 
> What's wrong with nil here?

What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.
I don't think we're this deep in system level programming to condone using nil.

I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.

> 
>> 
>>>> This sounds like a convoluted range check.
>>> Yes, it's a way to avoid the range check being done by the image.
>>>> I mean, yes it would be faster to do
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atOrNil: index
>>>> but I would rather see
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>>> That's not enough. You have to check the other end of the range as well.
>>> And that's what this method tries to avoid.
>> 
>> But its telling a different story.
> 
> What story is that?
> 

The method conveys "get something from me or a system-level thingything, whysoever".
Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.

>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
> 
> Performance is the benefit. You see it, but you don't think you need it.

We trade readability and understandability as well as simplicity for performance here. But we don't even know if we 
need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text? 
I know Tim complained, and probably rightfully so.
But I am still not convinced that this trade-off will pay out…

> 
>> 
>> 
>>>> or more robust even
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atPin: 32
>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>> 
>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
> 
> So, how would you use #atPin: with Shout?

With shout? probably not at all. This was a more general comment.
We're introducing a public API here, and it should have benefit beyond shout.

> 
>> 
>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
> 
> I'm sure many disagree with you on the boring part.

Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…

> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).

I'm aware of the parser there. I have no performance infos at hand. 
What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.
But that's a different thread :)

> 
>> 
>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
> 
> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
> 

more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't

Best regards
	-Tobias

PS:

What about

at: aNumber orOffLimitSentinel: anObject
	<primitive: 63>
	^ anObject

Or would the primitive bail?

> Levente
> 
>> 
>> Best regards
>> 	-Tobias :)
>> 
>> 
>> 
>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>> Levente
>>>> (even though that changes the semantic)
>>>> So, can you tell me the use case here?
>>>> Best regards
>>>> 	-Tobias
>>>> PS: That said, I think using #at: for String in the first place will
>>>> bite us down the road, whether we want to change the String
>>>> representation at some point (ropes, immutables, values, ...) or just
>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>> or $'è'