[squeak-dev] The Inbox: Collections-ul.844.mcz

Levente Uzonyi leves at caesar.elte.hu
Fri Jul 19 23:46:27 UTC 2019


On Sat, 20 Jul 2019, Tobias Pape wrote:

> Hi Levente :)
>
>
> Thanks for bearing with me here..

:)

>
>
>> On 20.07.2019, at 00:49, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>> 
>> On Fri, 19 Jul 2019, Tobias Pape wrote:
>> 
>>> 
>>>> On 19.07.2019, at 15:54, Levente Uzonyi <leves at caesar.elte.hu> wrote:
>>>> Hi Tobias,
>>>>>> […]
>>>>> Could you tell me the use case here?
>>>> ShoutCore-ul.66 in the Inbox.
>>> 
>>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
>> 
>> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?
>
> Yes, and I wanted to convey that I disagree.
>
>> 
>> Why does it matter what the frequency of the out-of-range access case is?
>
> Because optimizing it only makes sense if it is very frequent.

This optimizes both cases, but the real benefit comes from optimizing the 
most common case: when there's a character at the given index.
So, even if it didn't optimize the out-of-range case, it would still be a 
lot faster.

>
>> 
>>> If so, we should really tackle _that_ instead of making pouring out nils fast.
>> 
>> What's wrong with nil here?
>
> What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.

I think nil is a pretty good extremal element here:
1. it's not a character
2. it's a unique object with a global accessor
3. comparison is extremely quick
4. the compiler supports it with highly optimized branch statements

And the knowledge is not really lost, as all senders handle the extremal 
case. So, all uses are localized.

I could use a custom EndOfStream object, but compared to nil, it would 
lack property 4, so the code would be more verbose, and the performance 
would be marginally worse.

> I don't think we're this deep in system level programming to condone using nil.

It might resemble system programming, but the main difference is that 
there's no nil there. :)

>
> I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.
>
>> 
>>> 
>>>>> This sounds like a convoluted range check.
>>>> Yes, it's a way to avoid the range check being done by the image.
>>>>> I mean, yes it would be faster to do
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ knorz atOrNil: index
>>>>> but I would rather see
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>>>> That's not enough. You have to check the other end of the range as well.
>>>> And that's what this method tries to avoid.
>>> 
>>> But its telling a different story.
>> 
>> What story is that?
>> 
>
> The method conveys "get something from me or a system-level thingything, whysoever".
> Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.

Would you be fine with it if the method were called #atOrNilWhenTheIndexIsOutOfBounds:?

>
>>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
>> 
>> Performance is the benefit. You see it, but you don't think you need it.
>
> We trade readability and understandability as well as simplicity for performance here. But we don't even know if we

I don't think legibility got worse, but if it did, please point out where 
it happened.

> need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text? 
> I know Tim complained, and probably rightfully so.
> But I am still not convinced that this trade-off will pay out…

You can't know it without measuring it.
If you don't feel like rolling your own benchmark, good news: I just wrote 
one. Print it: SHParserST80 benchmark.

>
>> 
>>> 
>>> 
>>>>> or more robust even
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ knorz atPin: 32
>>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>>> 
>>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
>> 
>> So, how would you use #atPin: with Shout?
>
> With shout? probably not at all. This was a more general comment.
> We're introducing a public API here, and it should have benefit beyond shout.

Well, I simply copied this method from another project of mine, which had 
it implemented on SequenceableCollection and used it for all kinds of 
arrays.
For example, it could be used to speed up String's #at:ifAbsent: :).

>
>> 
>>> 
>>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
>> 
>> I'm sure many disagree with you on the boring part.
>
> Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…
>
>> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).
>
> I'm aware of the parser there. I have no performance infos at hand. 
> What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.

The three parsers are specialized on different areas. Shout's goal was to 
be fast and to produce enough output for styling. So, improving its 
performance is a natural thing, at least to me.

> But that's a different thread :)
>
>> 
>>> 
>>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
>> 
>> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
>> 
>
> more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't
>
> Best regards
> 	-Tobias
>
> PS:
>
> What about
>
> at: aNumber orOffLimitSentinel: anObject
> 	<primitive: 63>
> 	^ anObject
>
> Or would the primitive bail?

The primitive fails when the number of arguments is not exactly one.
I tried that trick long ago with #at:ifAbsent:, even suggested changing 
the VM code, but now, with the JIT, that would be a bit harder.

Levente

P.S.: I'm considering a different approach which uses a similar trick, but 
no one would bother because it wouldn't introduce any new methods (and 
that trick is used in many other places). But, I didn't want to rewrite 
Shout...

>
>
>> Levente
>> 
>>> 
>>> Best regards
>>> 	-Tobias :)
>>> 
>>> 
>>> 
>>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>>> Levente
>>>>> (even though that changes the semantic)
>>>>> So, can you tell me the use case here?
>>>>> Best regards
>>>>> 	-Tobias
>>>>> PS: That said, I think using #at: for String in the first place will
>>>>> bite us down the road, whether we want to change the String
>>>>> representation at some point (ropes, immutables, values, ...) or just
>>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>>> or $'è'


More information about the Squeak-dev mailing list