[squeak-dev] The Inbox: Collections-ct.851.mcz
Levente Uzonyi
leves at caesar.elte.hu
Thu Aug 15 23:23:44 UTC 2019
On Thu, 15 Aug 2019, Thiede, Christoph wrote:
>
> Thanks for your thoughts!
>
>
> > The new variant also changes the semantics, because it accepts unicode
>
> > digits. For example, the following won't raise an error but yield 9:
> >
> > '{', (Character value: 16r1d7eb) asString, '}' format: (1 to: 10).
>
> Interesting, but do you really think this is an undesired behavior?
It sounds fun until you notice the side effects.
> I changed the method with the aim to increase the readability, which was also impaired by the hardcoded numbers.
Indeed, that method is a prime example of trading legibility for performance,
even though the comments helped you out quite a bit, didn't they? :)
> In my eyes it is a nice side effect to support other kinds of Unicode values - NumberParser does the same.
IMO, it opens a can of worms:
- WideStrings use 4x as much memory as ByteStings, and they lack the VM support ByteStrings have, so many operations are significantly slower with them.
- WideStrings spread like plague:
- Wrote a WideString into a stream? your stream's buffer is now a WideString.
- Did some operation with a WideString, e.g. #,? The result is very likely a WideString.
- Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9 in there... Oh, wait, it's a WideString with a "Mathematical sans-serif digit nine".
Levente
>
> Christoph
>
> _________________________________________________________________________________________________________________________________________________________________________________________________________________________________
> Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Levente Uzonyi <leves at caesar.elte.hu>
> Gesendet: Freitag, 16. August 2019 00:51:14
> An: squeak-dev at lists.squeakfoundation.org
> Betreff: Re: [squeak-dev] The Inbox: Collections-ct.851.mcz
> On Thu, 15 Aug 2019, commits at source.squeak.org wrote:
>
> > A new version of Collections was added to project The Inbox:
> > http://source.squeak.org/inbox/Collections-ct.851.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Collections-ct.851
> > Author: ct
> > Time: 15 August 2019, 11:36:23.694735 pm
> > UUID: 6c7113c6-9d09-bf4c-9b32-e2001870bb56
> > Ancestors: Collections-ct.850
> >
> > Refactor String>>#format: according to Text>>#format:
> >
> > I could not find any significant performance impacts.
>
> Difference only appears when there are substitutions to be done. The more
> substitutions you have, the higher the difference in performance will be.
> This new version takes up to ~40% longer than the current implementation,
> which is still not optimal btw.
> The new variant also changes the semantics, because it accepts unicode
> digits. For example, the following won't raise an error but yield 9:
>
> '{', (Character value: 16r1d7eb) asString, '}' format: (1 to: 10).
>
> Levente
>
> >
> > We have some duplication between both #format: implementations. Do you thing this is a problem at the current scale?
> >
> > =============== Diff against Collections-ct.850 ===============
> >
> > Item was changed:
> > ----- Method: String>>format: (in category 'formatting') -----
> > + format: arguments
> > + "format the receiver with arguments
> > - format: aCollection
> > - "format the receiver with aCollection
> >
> > simplest example:
> > 'foo {1} bar' format: {Date today}.
> >
> > complete example:
> > '\{ \} \\ foo {1} bar {2}' format: {12. 'string'}.
> > "
> > + ^self class new: self size * 11 // 10 streamContents: [ :output |
> > - ^self class new: self size * 11 // 10 "+10%" streamContents: [ :output |
> > | lastIndex nextIndex |
> > lastIndex := 1.
> > [ (nextIndex := self indexOfAnyOf: FormatCharacterSet startingAt: lastIndex) = 0 ] whileFalse: [
> > nextIndex = lastIndex ifFalse: [
> > output next: nextIndex - lastIndex putAll: self startingAt: lastIndex ].
> > + (self at: nextIndex) caseOf: {
> > + [$\] -> [ output nextPut: (self at: (nextIndex := nextIndex + 1)) ].
> > + [${] -> [
> > - (self at: nextIndex) == $\
> > - ifTrue: [ output nextPut: (self at: (nextIndex := nextIndex + 1)) ]
> > - ifFalse: [ "${"
> > "Parse the index - a positive integer in base 10."
> > + | character collectionIndex |
> > - | digitValue collectionIndex |
> > collectionIndex := 0.
> > + [ (character := self at: (nextIndex := nextIndex + 1)) isDigit ] whileTrue: [
> > + collectionIndex := collectionIndex * 10 + character digitValue ].
> > + character = $} ifFalse: [ self error: '$} expected' ].
> > + output nextPutAll: (arguments at: collectionIndex) asString ] }.
> > - [ (digitValue := self basicAt: (nextIndex := nextIndex + 1)) between: 48 "$0 asciiValue" and: 57 "$9 asciiValue" ] whileTrue: [
> > - collectionIndex := collectionIndex * 10 + digitValue - 48. "$0 asciiValue" ].
> > - digitValue = 125 "$} asciiValue" ifFalse: [ self error: '$} expected' ].
> > - output nextPutAll: (aCollection at: collectionIndex) asString ].
> > lastIndex := nextIndex + 1 ].
> > lastIndex <= self size ifTrue: [
> > output next: self size - lastIndex + 1 putAll: self startingAt: lastIndex ] ]!
>
>
>
More information about the Squeak-dev
mailing list
|