Removing OneCharacterSymbols (was: Re: [squeak-dev] The Inbox:
Collections-nice.530.mcz)
Bert Freudenberg
bert at freudenbergs.de
Thu Sep 12 15:41:15 UTC 2013
On 2013-09-11, at 23:14, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> After verification, xColon is not so rare, it is invoked for every block argument...
> But we can safely replace (token := self step asSymbol) with (token := String with: self step), no problem.
>
> We can even change it to (token := self step) if we are willing to remove last scories of alternate selectors beginning with $:
> Indeed, #: and #:x are obsolete squeakism which would not compile in other dialects (#':' and #':x' is the universally correct syntax).
Well, all Squeak-derived dialects still do support it. Pharo changed the printString of #: to #':', while Squeak and Cuis still print #:. And they all accept #:, #:x, and #:x:::.
> Removing this feature consists in
> - removing #colon reference in xLitQuote scanLitWord scanLitVec scanAllTokenPosition
> - removing asSymbol send in xColon.
+ changing printing for Symbol
> But that would imply that methods with #: or #:x literals would not compile anymore
> And that #( :x ) would result in #( #':' #x ) instead of #( #':x' ).
> After my last hacking this would be a safe change in trunk (at least Compiler recompileAll works) and I'm all for it,
> ... but who knows what could happen to exotic third-party code.
> Having code load request resulting in a SyntaxError popping up is just one of the most annoying thing...
> And having interpretation of a literal changing without any sort of indication is not much better!
>
> #: would still be supported with (Scanner prefAllowUnicharSymbol: true) but not #:x
What else would the pref allow? #: is the only thing I've seen "in the wild" (occurring in Etoys, OMeta, and DrGeo)
> Maybe we could make it a preference too for smoother migration?
> Or some sort of disambiguating request if interactive and conservative behavior if not, like I did for 1 at -2?
That would seem like the user-friendliest option, better than a preference IMHO.
- Bert -
> Thoughts?
>
> 2013/9/11 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>
> Ah right, thanks Bert.
> I think that the send of xColon is rare anyway (assignment cases := apart, in which case we do not send asSymbol)
>
> #':' is not referred directly in the Compiler stuff:
> Let's see: (SystemNavigation default browseAllCallsOn: #':') -> Oops yet another bug
> See fix attached
>
> Though I think asSymbol is necessary given implementation of scanLitWord, I'll look deeper later
>
>
>
>
> 2013/9/11 Bert Freudenberg <bert at freudenbergs.de>
> Another sender used to be #xColon. Not sure if we need to worry about that one.
>
> CharAsSymbolSenders sortedCounts
> ==> {6474->#scanToken . 141->#xColon}
>
> (this was from before your compiler changes)
>
> - Bert -
>
> On 2013-09-06, at 01:58, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
>
>> Thanks Bert, I just published a Compiler version that does not...
>>
>>
>> 2013/9/5 Bert Freudenberg <bert at freudenbergs.de>
>>
>> On 2013-09-04, at 21:41, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
>>
>>> A rapid review of senders indicates that Character asSymbol seems rather rare in a trunk image.
>>>
>>> I found
>>> #initTextConstants
>>> #inOutdent:delta: (Character tab asSymbol asText which is rather crooked way of avoiding an allocation - String tab would allocate a new Object).
>>
>> Scanner>>scanToken appears to be the most often used sender, and may well be the original reason for caching the 1-char symbols.
>>
>> It might be a good idea to time compilation of complex methods with and without OneCharacterSymbols. (my guess is it wouldn't make much of a difference these days)
>>
>> - Bert -
>>
>>
>>>
>>>
>>> 2013/9/4 Levente Uzonyi <leves at elte.hu>
>>> There's another user of OneCharacterSymbols: Character >> #asSymbol. I don't know how often it's used (my guess is that it's rarely used), or how much slower it would be, if we were removing OneCharacterSymbols. The implementation would change from
>>>
>>> ^Symbol internCharacter: self
>>>
>>> to
>>>
>>> ^Symbol intern: self asString
>>>
>>> Which means that an extra String will be created, and the lookup is done in the WeakSets, instead of an Array/WeakArray.
>>>
>>>
>>> Levente
>>>
>>> On Wed, 4 Sep 2013, Nicolas Cellier wrote:
>>>
>>> Well, as you were the last modifier, I prefer to have your advice :)
>>>
>>> Nicolas
>>>
>>>
>>> 2013/9/4 Levente Uzonyi <leves at elte.hu>
>>> On Wed, 4 Sep 2013, commits at source.squeak.org wrote:
>>>
>>> A new version of Collections was added to project The Inbox:
>>> http://source.squeak.org/inbox/Collections-nice.530.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Collections-nice.530
>>> Author: nice
>>> Time: 4 September 2013, 2:21:28.852 am
>>> UUID: d732aa23-1c77-4c1e-abc0-d8d4506b7f9f
>>> Ancestors: Collections-ul.529
>>>
>>> Fix this:
>>> self assert: (allSymbols select: [:s | s = '+']) size = 1.
>>>
>>> 1) It is un-necessary to enumerate the 1-char symbols, they already are in SymbolTable (the first time they were created via #findIntern: or by virtue of last #rehash) !
>>>
>>> 2) We can use a WeekArray rather than an Array because there is no point in keeping a reference to unused 1-char symbols.
>>>
>>> Maybe we should get rid of OneCharacterSymbols class var, but I didn't dare...
>>>
>>>
>>> I say we should nuke it. The only user is Symbol class >> #selectorsContaining:, but that method is pretty much broken:
>>> - it uses OneCharacterSymbols only for infix selectors
>>> - it can't find infix selectors longer than 2 characters
>>> - it tries to optimize some stuff, but wastes cycles on other stuff
>>>
>>>
>>> Levente
>>>
>>>
>>> =============== Diff against Collections-ul.529 ===============
>>>
>>> Item was changed:
>>> ----- Method: Symbol class>>allSymbols (in category 'access') -----
>>> allSymbols
>>> "Answer all interned symbols"
>>> ^Array streamContents:[:s|
>>> s nextPutAll: NewSymbols.
>>> - s nextPutAll: OneCharacterSymbols.
>>> s nextPutAll: SymbolTable.
>>> ].
>>> !
>>>
>>> Item was changed:
>>> ----- Method: Symbol class>>initialize (in category 'class initialization') -----
>>> initialize
>>>
>>> "Symbol initialize"
>>>
>>> Symbol rehash.
>>> + OneCharacterSymbols := WeakArray new: 256.
>>> + (0 to: 255) do: [ :byte | byte asCharacter asSymbol].
>>> - OneCharacterSymbols := nil.
>>> - OneCharacterSymbols := (1 to: 256) collect: [ :i | (i - 1) asCharacter asSymbol].
>>> Smalltalk addToShutDownList: self.
>>> !
>>>
>>> Item was changed:
>>> ----- Method: Symbol class>>internCharacter: (in category 'instance creation') -----
>>> internCharacter: aCharacter
>>> aCharacter asciiValue > 256 ifTrue:[^self intern: aCharacter asString].
>>> + ^(OneCharacterSymbols at: aCharacter asciiValue + 1) ifNil: [OneCharacterSymbols at: aCharacter asciiValue + 1 put: (self intern: aCharacter asString)]
>>> - OneCharacterSymbols ifNil: [^self intern: aCharacter asString].
>>> - ^OneCharacterSymbols at: aCharacter asciiValue + 1
>>> !
>>>
>>> Item was changed:
>>> ----- Method: Symbol class>>selectorsContaining: (in category 'access') -----
>>> selectorsContaining: aString
>>> "Answer a list of selectors that contain aString within them. Case-insensitive. Does return symbols that begin with a capital letter."
>>>
>>> | size selectorList ascii |
>>>
>>> selectorList := OrderedCollection new.
>>> (size := aString size) = 0 ifTrue: [^selectorList].
>>>
>>> aString size = 1 ifTrue:
>>> [
>>> ascii := aString first asciiValue.
>>> + ascii < 128 ifTrue: [(OneCharacterSymbols at: ascii+1) ifNotNil: [:s | selectorList add: s]]
>>> - ascii < 128 ifTrue: [selectorList add: (OneCharacterSymbols at: ascii+1)]
>>> ].
>>>
>>> (aString first isLetter or: [aString first isDigit]) ifFalse:
>>> [
>>> aString size = 2 ifTrue:
>>> [Symbol hasInterned: aString ifTrue:
>>> [:s | selectorList add: s]].
>>> ^selectorList
>>> ].
>>>
>>> selectorList := selectorList copyFrom: 2 to: selectorList size.
>>>
>>> self allSymbolTablesDo: [:each |
>>> each size >= size ifTrue:
>>> [(each findSubstring: aString in: each startingAt: 1
>>> matchTable: CaseInsensitiveOrder) > 0
>>> ifTrue: [selectorList add: each]]].
>>>
>>> ^selectorList reject: [:each | "reject non-selectors, but keep ones that begin with an uppercase"
>>> each numArgs < 0 and: [each asString withFirstCharacterDownshifted numArgs < 0]].
>>>
>>> "Symbol selectorsContaining: 'scon'"!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20130912/e810e39f/attachment.htm
More information about the Squeak-dev
mailing list
|