Removing OneCharacterSymbols (was: Re: [squeak-dev] The Inbox: Collections-nice.530.mcz)

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Wed Sep 11 21:14:47 UTC 2013


After verification, xColon is not so rare, it is invoked for every block
argument...
But we can safely replace (token := self step asSymbol) with (token :=
String with: self step), no problem.

We can even change it to (token := self step) if we are willing to remove
last scories of alternate selectors beginning with $:
Indeed, #: and #:x are obsolete squeakism which would not compile in other
dialects (#':' and #':x' is the universally correct syntax).

Removing this feature consists in
- removing #colon reference in xLitQuote scanLitWord scanLitVec
scanAllTokenPosition
- removing asSymbol send in xColon.

But that would imply that methods with #: or #:x literals would not compile
anymore
And that #( :x ) would result in #( #':' #x ) instead of #( #':x' ).
After my last hacking this would be a safe change in trunk (at least
Compiler recompileAll works) and I'm all for it,
... but who knows what could happen to exotic third-party code.
Having code load request resulting in a SyntaxError popping up is just one
of the most annoying thing...
And having interpretation of a literal changing without any sort of
indication is not much better!

#: would still be supported with (Scanner prefAllowUnicharSymbol: true) but
not #:x
Maybe we could make it a preference too for smoother migration?
Or some sort of disambiguating request if interactive and conservative
behavior if not, like I did for 1 at -2?

Thoughts?

2013/9/11 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>

> Ah right, thanks Bert.
> I think that the send of xColon is rare anyway (assignment cases := apart,
> in which case we do not send asSymbol)
>
> #':' is not referred directly in the Compiler stuff:
> Let's see: (SystemNavigation default browseAllCallsOn: #':') -> Oops yet
> another bug
> See fix attached
>
> Though I think asSymbol is necessary given implementation of scanLitWord,
> I'll look deeper later
>
>
>
>
> 2013/9/11 Bert Freudenberg <bert at freudenbergs.de>
>
>> Another sender used to be #xColon. Not sure if we need to worry about
>> that one.
>>
>> CharAsSymbolSenders sortedCounts
>> ==> {6474->#scanToken . 141->#xColon}
>>
>> (this was from before your compiler changes)
>>
>> - Bert -
>>
>> On 2013-09-06, at 01:58, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>
>> Thanks Bert, I just published a Compiler version that does not...
>>
>>
>> 2013/9/5 Bert Freudenberg <bert at freudenbergs.de>
>>
>>>
>>> On 2013-09-04, at 21:41, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>>
>>> A rapid review of senders indicates that Character asSymbol seems rather
>>> rare in a trunk image.
>>>
>>> I found
>>> #initTextConstants
>>> #inOutdent:delta: (Character tab asSymbol asText which is rather crooked
>>> way of avoiding an allocation - String tab would allocate a new Object).
>>>
>>>
>>> Scanner>>scanToken appears to be the most often used sender, and may
>>> well be the original reason for caching the 1-char symbols.
>>>
>>> It might be a good idea to time compilation of complex methods with and
>>> without OneCharacterSymbols. (my guess is it wouldn't make much of a
>>> difference these days)
>>>
>>> - Bert -
>>>
>>>
>>>
>>>
>>> 2013/9/4 Levente Uzonyi <leves at elte.hu>
>>>
>>>> There's another user of OneCharacterSymbols: Character >> #asSymbol. I
>>>> don't know how often it's used (my guess is that it's rarely used), or how
>>>> much slower it would be, if we were removing OneCharacterSymbols. The
>>>> implementation would change from
>>>>
>>>>         ^Symbol internCharacter: self
>>>>
>>>> to
>>>>
>>>>         ^Symbol intern: self asString
>>>>
>>>> Which means that an extra String will be created, and the lookup is
>>>> done in the WeakSets, instead of an Array/WeakArray.
>>>>
>>>>
>>>> Levente
>>>>
>>>> On Wed, 4 Sep 2013, Nicolas Cellier wrote:
>>>>
>>>>  Well, as you were the last modifier, I prefer to have your advice :)
>>>>>
>>>>> Nicolas
>>>>>
>>>>>
>>>>> 2013/9/4 Levente Uzonyi <leves at elte.hu>
>>>>>       On Wed, 4 Sep 2013, commits at source.squeak.org wrote:
>>>>>
>>>>>             A new version of Collections was added to project The
>>>>> Inbox:
>>>>>             http://source.squeak.org/**inbox/Collections-nice.530.mcz<http://source.squeak.org/inbox/Collections-nice.530.mcz>
>>>>>
>>>>>             ==================== Summary ====================
>>>>>
>>>>>             Name: Collections-nice.530
>>>>>             Author: nice
>>>>>             Time: 4 September 2013, 2:21:28.852 am
>>>>>             UUID: d732aa23-1c77-4c1e-abc0-**d8d4506b7f9f
>>>>>             Ancestors: Collections-ul.529
>>>>>
>>>>>             Fix this:
>>>>>              self assert: (allSymbols select: [:s | s = '+']) size = 1.
>>>>>
>>>>>             1) It is un-necessary to enumerate the 1-char symbols,
>>>>> they already are in SymbolTable (the first time they were created via
>>>>> #findIntern: or by virtue of last #rehash) !
>>>>>
>>>>>             2) We can use a WeekArray rather than an Array because
>>>>> there is no point in keeping a reference to unused 1-char symbols.
>>>>>
>>>>>             Maybe we should get rid of OneCharacterSymbols class var,
>>>>> but I didn't dare...
>>>>>
>>>>>
>>>>> I say we should nuke it. The only user is Symbol class >>
>>>>> #selectorsContaining:, but that method is pretty much broken:
>>>>> - it uses OneCharacterSymbols only for infix selectors
>>>>> - it can't find infix selectors longer than 2 characters
>>>>> - it tries to optimize some stuff, but wastes cycles on other stuff
>>>>>
>>>>>
>>>>> Levente
>>>>>
>>>>>
>>>>>       =============== Diff against Collections-ul.529 ===============
>>>>>
>>>>>       Item was changed:
>>>>>        ----- Method: Symbol class>>allSymbols (in category 'access')
>>>>> -----
>>>>>        allSymbols
>>>>>               "Answer all interned symbols"
>>>>>               ^Array streamContents:[:s|
>>>>>                       s nextPutAll: NewSymbols.
>>>>>       -               s nextPutAll: OneCharacterSymbols.
>>>>>                       s nextPutAll: SymbolTable.
>>>>>               ].
>>>>>        !
>>>>>
>>>>>       Item was changed:
>>>>>        ----- Method: Symbol class>>initialize (in category 'class
>>>>> initialization') -----
>>>>>        initialize
>>>>>
>>>>>               "Symbol initialize"
>>>>>
>>>>>               Symbol rehash.
>>>>>       +       OneCharacterSymbols := WeakArray new: 256.
>>>>>       +       (0 to: 255) do: [ :byte | byte asCharacter asSymbol].
>>>>>       -       OneCharacterSymbols := nil.
>>>>>       -       OneCharacterSymbols := (1 to: 256) collect: [ :i | (i -
>>>>> 1) asCharacter asSymbol].
>>>>>               Smalltalk addToShutDownList: self.
>>>>>        !
>>>>>
>>>>>       Item was changed:
>>>>>        ----- Method: Symbol class>>internCharacter: (in category
>>>>> 'instance creation') -----
>>>>>        internCharacter: aCharacter
>>>>>               aCharacter asciiValue > 256 ifTrue:[^self intern:
>>>>> aCharacter asString].
>>>>>       +       ^(OneCharacterSymbols at: aCharacter asciiValue + 1)
>>>>> ifNil: [OneCharacterSymbols at: aCharacter asciiValue + 1 put: (self
>>>>> intern: aCharacter asString)]
>>>>>       -       OneCharacterSymbols ifNil: [^self intern: aCharacter
>>>>> asString].
>>>>>       -       ^OneCharacterSymbols at: aCharacter asciiValue + 1
>>>>>        !
>>>>>
>>>>>       Item was changed:
>>>>>        ----- Method: Symbol class>>selectorsContaining: (in category
>>>>> 'access') -----
>>>>>        selectorsContaining: aString
>>>>>               "Answer a list of selectors that contain aString within
>>>>> them. Case-insensitive.  Does return symbols that begin with a capital
>>>>> letter."
>>>>>
>>>>>               | size selectorList ascii |
>>>>>
>>>>>               selectorList := OrderedCollection new.
>>>>>               (size := aString size) = 0 ifTrue: [^selectorList].
>>>>>
>>>>>               aString size = 1 ifTrue:
>>>>>                       [
>>>>>                               ascii := aString first asciiValue.
>>>>>       +                       ascii < 128 ifTrue:
>>>>> [(OneCharacterSymbols at: ascii+1) ifNotNil: [:s | selectorList add: s]]
>>>>>       -                       ascii < 128 ifTrue: [selectorList add:
>>>>> (OneCharacterSymbols at: ascii+1)]
>>>>>                       ].
>>>>>
>>>>>               (aString first isLetter or: [aString first isDigit])
>>>>> ifFalse:
>>>>>                       [
>>>>>                               aString size = 2 ifTrue:
>>>>>                                       [Symbol hasInterned: aString
>>>>> ifTrue:
>>>>>                                               [:s | selectorList add:
>>>>> s]].
>>>>>                               ^selectorList
>>>>>                       ].
>>>>>
>>>>>               selectorList := selectorList copyFrom: 2 to:
>>>>> selectorList size.
>>>>>
>>>>>               self allSymbolTablesDo: [:each |
>>>>>                       each size >= size ifTrue:
>>>>>                               [(each findSubstring: aString in: each
>>>>> startingAt: 1
>>>>>                                       matchTable:
>>>>> CaseInsensitiveOrder) > 0
>>>>>                                                       ifTrue:
>>>>> [selectorList add: each]]].
>>>>>
>>>>>               ^selectorList reject: [:each | "reject non-selectors,
>>>>> but keep ones that begin with an uppercase"
>>>>>                       each numArgs < 0 and: [each asString
>>>>> withFirstCharacterDownshifted numArgs < 0]].
>>>>>
>>>>>        "Symbol selectorsContaining: 'scon'"!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20130911/90833380/attachment.htm


More information about the Squeak-dev mailing list