[squeak-dev] The Inbox: Collections-ct.851.mcz

Levente Uzonyi leves at caesar.elte.hu
Thu Aug 15 23:23:44 UTC 2019


On Thu, 15 Aug 2019, Thiede, Christoph wrote:

> 
> Thanks for your thoughts!
> 
> 
> > The new variant also changes the semantics, because it accepts unicode 
> 
> > digits. For example, the following won't raise an error but yield 9:
>> >          '{', (Character value: 16r1d7eb) asString, '}' format: (1 to: 10).
> 
> Interesting, but do you really think this is an undesired behavior?

It sounds fun until you notice the side effects.

> I changed the method with the aim to increase the readability, which was also impaired by the hardcoded numbers.

Indeed, that method is a prime example of trading legibility for performance, 
even though the comments helped you out quite a bit, didn't they? :)

> In my eyes it is a nice side effect to support other kinds of Unicode values - NumberParser does the same.

IMO, it opens a can of worms:
- WideStrings use 4x as much memory as ByteStings, and they lack the VM support ByteStrings have, so many operations are significantly slower with them.
- WideStrings spread like plague:
 	- Wrote a WideString into a stream? your stream's buffer is now a WideString.
 	- Did some operation with a WideString, e.g. #,? The result is very likely a WideString.
- Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9 in there... Oh, wait, it's a WideString with a "Mathematical sans-serif digit nine".


Levente

> 
> Christoph
> 
> _________________________________________________________________________________________________________________________________________________________________________________________________________________________________
> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Levente Uzonyi <leves at caesar.elte.hu>
> Gesendet: Freitag, 16. August 2019 00:51:14
> An: squeak-dev at lists.squeakfoundation.org
> Betreff: Re: [squeak-dev] The Inbox: Collections-ct.851.mcz  
> On Thu, 15 Aug 2019, commits at source.squeak.org wrote:
> 
> > A new version of Collections was added to project The Inbox:
> > http://source.squeak.org/inbox/Collections-ct.851.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Collections-ct.851
> > Author: ct
> > Time: 15 August 2019, 11:36:23.694735 pm
> > UUID: 6c7113c6-9d09-bf4c-9b32-e2001870bb56
> > Ancestors: Collections-ct.850
> >
> > Refactor String>>#format: according to Text>>#format:
> >
> > I could not find any significant performance impacts.
> 
> Difference only appears when there are substitutions to be done. The more
> substitutions you have, the higher the difference in performance will be.
> This new version takes up to ~40% longer than the current implementation,
> which is still not optimal btw.
> The new variant also changes the semantics, because it accepts unicode
> digits. For example, the following won't raise an error but yield 9:
> 
>          '{', (Character value: 16r1d7eb) asString, '}' format: (1 to: 10).
> 
> Levente
> 
> >
> > We have some duplication between both #format: implementations. Do you thing this is a problem at the current scale?
> >
> > =============== Diff against Collections-ct.850 ===============
> >
> > Item was changed:
> >  ----- Method: String>>format: (in category 'formatting') -----
> > + format: arguments
> > +      "format the receiver with arguments
> > - format: aCollection
> > -      "format the receiver with aCollection
> >
> >        simplest example:
> >        'foo {1} bar' format: {Date today}.
> >
> >        complete example:
> >        '\{ \} \\ foo {1} bar {2}' format: {12. 'string'}.
> >        "
> > +      ^self class new: self size * 11 // 10 streamContents: [ :output |
> > -      ^self class new: self size * 11 // 10 "+10%" streamContents: [ :output |
> >                | lastIndex nextIndex |
> >                lastIndex := 1.
> >                [ (nextIndex := self indexOfAnyOf: FormatCharacterSet startingAt: lastIndex) = 0 ] whileFalse: [
> >                        nextIndex = lastIndex ifFalse: [
> >                                output next: nextIndex - lastIndex putAll: self startingAt: lastIndex ].
> > +                      (self at: nextIndex) caseOf: {
> > +                              [$\] -> [ output nextPut: (self at: (nextIndex := nextIndex + 1)) ].
> > +                              [${] -> [
> > -                      (self at: nextIndex) == $\
> > -                              ifTrue: [ output nextPut: (self at: (nextIndex := nextIndex + 1)) ]
> > -                              ifFalse: [ "${"
> >                                        "Parse the index - a positive integer in base 10."
> > +                                      | character collectionIndex |
> > -                                      | digitValue collectionIndex |
> >                                        collectionIndex := 0.
> > +                                      [ (character := self at: (nextIndex := nextIndex + 1)) isDigit ] whileTrue: [
> > +                                              collectionIndex := collectionIndex * 10 + character digitValue ].
> > +                                      character = $} ifFalse: [ self error: '$} expected' ].
> > +                                      output nextPutAll: (arguments at: collectionIndex) asString ] }.
> > -                                      [ (digitValue := self basicAt: (nextIndex := nextIndex + 1)) between: 48 "$0 asciiValue" and: 57 "$9 asciiValue" ] whileTrue: [
> > -                                              collectionIndex := collectionIndex * 10 + digitValue - 48. "$0 asciiValue" ].
> > -                                      digitValue =  125 "$} asciiValue" ifFalse: [ self error: '$} expected' ].
> > -                                      output nextPutAll: (aCollection at: collectionIndex) asString ].
> >                        lastIndex := nextIndex + 1 ].
> >                lastIndex <= self size ifTrue: [
> >                        output next: self size - lastIndex + 1 putAll: self startingAt: lastIndex ] ]!
> 
> 
>


More information about the Squeak-dev mailing list