[squeak-dev] The Inbox: ShoutCore-ul.41.mcz
Levente Uzonyi
leves at elte.hu
Mon Feb 10 19:16:49 UTC 2014
Some of these optimizations are not nice, nor necessary for modern
machines, but with these changes Shout can parse ~50% more methods
according to the benchmark below.
| p sources |
sources := OrderedCollection new.
CurrentReadOnlySourceFiles cacheDuring: [
Smalltalk allClassesDo: [ :class |
class selectors collect: [ :selector |
sources add: class -> (class >> selector) getSource asString ] ] ].
p := SHParserST80 new.
{
[
| previousClass |
previousClass := nil.
sources do: [ :source |
| class |
class := source key.
previousClass == class ifFalse: [
p classOrMetaClass: source key ].
p
source: source value;
parse;
ranges ] ] timeToRun.
sources size }
My pc can parse 8 methods per millisecond using Cog, which might be enough
for Shout to become usable on a really slow machine like the Raspberry Pi.
Levente
On Mon, 10 Feb 2014, commits at source.squeak.org wrote:
> A new version of ShoutCore was added to project The Inbox:
> http://source.squeak.org/inbox/ShoutCore-ul.41.mcz
>
> ==================== Summary ====================
>
> Name: ShoutCore-ul.41
> Author: ul
> Time: 10 February 2014, 6:48:33.606 pm
> UUID: b71b4d71-9420-4b18-a82a-dc2842617797
> Ancestors: ShoutCore-cwp.40
>
> SHParserST80 optimizations:
> - don't collect 'answer' in #parseString, because it's very slow, and the result is never used
> - assume that the whole source uses the same EncodedCharSet. Cache it in the parser and use it to decide if a character is alphanumeric, a letter or a digit. (requires Multilingual-ul.195)
> - avoid evaluating blocks in #currentChar and #peekChar
> - use symbols for comparison wherever possible
> - unrolled some loops
>
> =============== Diff against ShoutCore-cwp.40 ===============
>
> Item was changed:
> Object subclass: #SHParserST80
> + instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment encodedCharSet'
> - instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment'
> classVariableNames: ''
> poolDictionaries: ''
> category: 'ShoutCore-Parsing'!
>
> !SHParserST80 commentStamp: 'tween 8/16/2004 15:44' prior: 0!
> I am a Smalltalk method / expression parser.
>
> Rather than creating an Abstract Syntax Tree, I create a sequence of SHRanges (in my 'ranges' instance variable), which represent the tokens within the String I am parsing.
>
> I am used by a SHTextStylerST80 to parse method source strings.
> I am able to parse incomplete / incorrect methods, and so can be used to parse methods that are being edited.
>
> My 'source' instance variable should be set to the string to be parsed.
>
> My 'classOrMetaClass' instance var must be set to the class or metaClass for the method source so that I can correctly resolve identifiers within the source. If this is nil , I parse the source as an expression (i.e. a doIt expression).
>
> My 'workspace' instance variable can be set to a Workspace, so that I can resolve workspace variables.
>
> My 'environment' instance variable is the global namespace (this is initialized to Smalltalk, but can be set to a different environment).
>
> Example 1.
> ranges := SHParserST80 new
> classOrMetaClass: Object;
> source: 'testMethod ^self';
> parse;
> ranges
>
> !
>
> Item was changed:
> ----- Method: SHParserST80>>currentChar (in category 'scan') -----
> currentChar
> + ^source at: sourcePosition ifAbsent: nil!
> - ^source at: sourcePosition ifAbsent: [nil]!
>
> Item was changed:
> ----- Method: SHParserST80>>isKeyword (in category 'token testing') -----
> isKeyword
> "This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
>
> + ^((encodedCharSet isLetter: currentTokenFirst) or: [
> - ^(currentTokenFirst isLetter or: [
> currentTokenFirst == $_ and: [
> currentToken notNil and: [
> currentToken size > 1 and: [
> self allowUnderscoreSelectors ] ] ] ]) and: [
> currentToken last == $: ]!
>
> Item was changed:
> ----- Method: SHParserST80>>isName (in category 'token testing') -----
> isName
> "This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
>
> + ^((encodedCharSet isLetter: currentTokenFirst)
> + or: [ currentTokenFirst == $_
> + and: [ currentToken notNil
> + and: [ currentToken size > 1
> + and: [ self allowUnderscoreSelectors ] ] ] ])
> + and: [ (encodedCharSet isAlphaNumeric: currentToken last)
> + or: [ currentToken last == $_
> + and: [ self allowUnderscoreSelectors ] ] ] !
> - ^(currentTokenFirst isLetter or: [
> - currentTokenFirst == $_ and: [
> - currentToken notNil and: [
> - currentToken size > 1 and: [
> - self allowUnderscoreSelectors ] ] ] ]) and: [
> - currentToken last isAlphaNumeric or: [
> - currentToken last == $_ and: [
> - self allowUnderscoreSelectors ] ] ] !
>
> Item was changed:
> ----- Method: SHParserST80>>isSelectorCharacter: (in category 'character testing') -----
> isSelectorCharacter: aCharacter
>
> + (encodedCharSet isAlphaNumeric: aCharacter) ifTrue: [^false].
> - aCharacter isAlphaNumeric ifTrue: [^false].
> aCharacter isSeparator ifTrue:[^false].
> ('"#$'':().;[]{}^_' includes: aCharacter)
> ifTrue:[^false].
> aCharacter asciiValue = 30 ifTrue: [^false "the doIt char"].
> aCharacter asciiValue = 0 ifTrue: [^false].
> "Any other char is ok as a binary selector char."
> ^true
> !
>
> Item was changed:
> ----- Method: SHParserST80>>parseByteArray (in category 'parse') -----
> parseByteArray
> [currentTokenFirst == $]] whileFalse: [
> + (encodedCharSet isDigit: currentTokenFirst)
> - currentTokenFirst isDigit
> ifTrue: [
> "do not parse the number, can be time consuming"
> self scanPast: #number]
> ifFalse: [
> self failWhen: currentTokenFirst == $. .
> self error]].
> self scanPast: #byteArrayEnd!
>
> Item was changed:
> ----- Method: SHParserST80>>parseExternalCall (in category 'parse') -----
> parseExternalCall
> [self scanNext.
> ((Smalltalk at: #ExternalFunction) callingConventionModifierFor: currentToken) notNil]
> whileTrue.
> self failUnless: currentToken notNil.
> self scanPast: #externalCallType.
> currentToken = '*'
> ifTrue: [self scanPast: #externalCallTypePointerIndicator].
> + (encodedCharSet isDigit: currentTokenFirst)
> - currentTokenFirst isDigit
> ifTrue: [self scanPast: #integer]
> ifFalse: [
> self failUnless: currentTokenFirst == $'.
> self parseString].
> self failUnless: currentTokenFirst == $(.
> self scanPast: #leftParenthesis.
> [currentTokenFirst ~= $)]
> whileTrue: [
> self failUnless: currentToken notNil.
> self scanPast: #externalCallType.
> currentToken = '*'
> ifTrue: [self scanPast: #externalCallTypePointerIndicator]].
> self scanPast: #rightParenthesis.
> currentToken = 'module:'
> ifTrue: [
> self scanPast: #module.
> self parseStringOrSymbol ].
> currentToken = 'error:' ifTrue: [
> self scanPast: #primitive. "there's no rangeType for error"
> self isName
> ifTrue: [ self scanPast: #patternTempVar ]
> ifFalse: [ self parseStringOrSymbol ] ].
> self failUnless: currentToken = '>'.
> self scanPast: #primitiveOrExternalCallEnd!
>
> Item was changed:
> ----- Method: SHParserST80>>parseLiteral: (in category 'parse') -----
> parseLiteral: inArray
> currentTokenFirst == $$
> ifTrue: [
> | pos |
> self failWhen: self currentChar isNil.
> self rangeType: #'$'.
> pos := currentTokenSourcePosition + 1.
> self nextChar.
> ^self scanPast: #character start: pos end: pos].
> + (encodedCharSet isDigit: currentTokenFirst)
> - currentTokenFirst isDigit
> ifTrue: [
> "do not parse the number, can be time consuming"
> ^self scanPast: #number].
> currentToken = '-'
> ifTrue: [
> | c |
> c := self currentChar.
> + (inArray and: [c isNil or: [(encodedCharSet isDigit: c) not]])
> - (inArray and: [c isNil or: [c isDigit not]])
> ifTrue: [
> "single - can be a symbol in an Array"
> ^self scanPast: #symbol].
> self scanPast: #-.
> self failWhen: currentToken isNil.
> "token isNil ifTrue: [self error: 'Unexpected End Of Input']."
> "do not parse the number, can be time consuming"
> ^self scanPast: #number].
> currentTokenFirst == $' ifTrue: [^self parseString].
> currentTokenFirst == $# ifTrue: [^self parseSymbol].
> (inArray and: [currentToken notNil]) ifTrue: [^self scanPast: #symbol].
> self failWhen: currentTokenFirst == $. .
> self error ": 'argument missing'"!
>
> Item was changed:
> ----- Method: SHParserST80>>parseLiteralArrayElement (in category 'parse') -----
> parseLiteralArrayElement
> + (encodedCharSet isLetter: currentTokenFirst)
> - currentTokenFirst isLetter
> ifTrue: [
> + #true = currentToken ifTrue: [ ^self scanPast: #true ].
> + #false = currentToken ifTrue: [ ^self scanPast: #false ].
> + #nil = currentToken ifTrue: [ ^self scanPast: #nil ].
> + ^self scanPast: #symbol ].
> - | type |
> - type := (#('true' 'false' 'nil') includes: currentToken)
> - ifTrue: [currentToken asSymbol]
> - ifFalse: [#symbol].
> - ^self scanPast: type].
> currentTokenFirst == $(
> ifTrue: [
> self scanPast: #arrayStart.
> ^self parseArray].
> ^self parseLiteral: true!
>
> Item was changed:
> ----- Method: SHParserST80>>parsePrimitive (in category 'parse') -----
> parsePrimitive
>
> self scanNext.
> + (encodedCharSet isDigit: currentTokenFirst)
> - currentTokenFirst isDigit
> ifTrue: [ self scanPast: #integer ]
> ifFalse: [
> self parseStringOrSymbol.
> currentToken = 'module:' ifTrue: [
> self scanPast: #module.
> self parseStringOrSymbol ] ].
> currentToken = 'error:' ifTrue: [
> self scanPast: #primitive. "there's no rangeType for error"
> self isName
> ifTrue: [ self scanPast: #patternTempVar ]
> ifFalse: [ self parseStringOrSymbol ] ].
> self failUnless: currentToken = '>'.
> self scanPast: #primitiveOrExternalCallEnd!
>
> Item was changed:
> ----- Method: SHParserST80>>parseString (in category 'parse') -----
> parseString
> + | first c last |
> - | first c answer last |
> first := sourcePosition.
> - answer := ''.
>
> [(c := self currentChar)
> ifNil: [
> self rangeType: #unfinishedString start: first - 1 end: source size.
> self error ": 'unfinished string'"].
> + c ~~ $' or: [
> - (c == $'
> - ifFalse: [answer := answer copyWith: c. true]
> - ifTrue: [false]
> - ) or: [
> self peekChar == $'
> ifTrue: [
> sourcePosition := sourcePosition + 1.
> - answer := answer copyWith: $'.
> true]
> ifFalse: [false]]
> ] whileTrue: [sourcePosition := sourcePosition + 1].
> last := sourcePosition.
> self nextChar.
> + self scanPast: #string start: first - 1 end: last!
> - self scanPast: #string start: first - 1 end: last.
> - ^answer!
>
> Item was changed:
> ----- Method: SHParserST80>>parseSymbol (in category 'parse') -----
> parseSymbol
> | c |
> currentToken = '#'
> ifTrue: [
> "if token is just the #, then scan whitespace and comments
> and then process the next character.
> Squeak allows space between the # and the start of the symbol
> e.g. # (), # a, # 'sym' "
> self rangeType: #symbol.
> self scanWhitespace].
> c := self currentChar.
> self failWhen: (c isNil or: [c isSeparator]).
> c == $(
> ifTrue: [
> self nextChar.
> self scanPast: #arrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
> ^self parseArray].
> c == $' ifTrue: [^self parseSymbolString].
> c == $[ ifTrue: [
> self nextChar.
> self scanPast: #byteArrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
> ^self parseByteArray].
> ((self isSelectorCharacter: c) or: [c == $-])
> ifTrue: [^self parseSymbolSelector].
> + ((encodedCharSet isLetter: c) or: [
> - (c isLetter or: [
> c == $: or: [
> c == $_ and: [self allowUnderscoreSelectors] ] ])
> ifTrue: [^self parseSymbolIdentifier].
> ^self parseCharSymbol!
>
> Item was changed:
> ----- Method: SHParserST80>>parseSymbolIdentifier (in category 'parse') -----
> parseSymbolIdentifier
> | c start end |
> c := self currentChar.
> + self failUnless: ((encodedCharSet isLetter: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
> - self failUnless: (c isLetter or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
> start := sourcePosition.
> [c := self nextChar.
> + (encodedCharSet isAlphaNumeric: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
> - c isAlphaNumeric or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
> end := sourcePosition - 1.
> c := source copyFrom: start - 1 to: end.
> self scanPast: #symbol start: start - 1 end: end.
> ^c!
>
> Item was changed:
> ----- Method: SHParserST80>>peekChar (in category 'scan') -----
> peekChar
> + ^source at: sourcePosition + 1 ifAbsent: $ !
> - ^source at: sourcePosition + 1 ifAbsent: [$ ]!
>
> Item was changed:
> ----- Method: SHParserST80>>resolve: (in category 'identifier testing') -----
> + resolve: aString
> +
> + aString = #self ifTrue: [ ^#self ].
> + aString = #true ifTrue: [ ^#true ].
> + aString = #false ifTrue: [ ^#false ].
> + aString = #nil ifTrue: [ ^#nil ].
> + aString = #super ifTrue: [ ^#super ].
> + aString = #thisContext ifTrue: [ ^#thisContext ].
> - resolve: aString
> - (#('self' 'super' 'true' 'false' 'nil' 'thisContext') includes: aString)
> - ifTrue: [^aString asSymbol].
> (self isBlockTempName: aString) ifTrue: [^#blockTempVar].
> (self isBlockArgName: aString) ifTrue: [^#blockArg].
> (self isMethodTempName: aString) ifTrue: [^#tempVar].
> (self isMethodArgName: aString) ifTrue: [^#methodArg].
> (instanceVariables includes: aString) ifTrue: [^#instVar].
> workspace
> ifNotNil: [(workspace hasBindingOf: aString) ifTrue: [^#workspaceVar]].
> Symbol hasInterned: aString ifTrue: [:sym |
> classOrMetaClass isBehavior
> ifTrue: [
> classOrMetaClass theNonMetaClass withAllSuperclasses do: [:c |
> (c classPool bindingOf: sym) ifNotNil: [^#classVar].
> c sharedPools do: [:p | (p bindingOf: sym) ifNotNil: [^#poolConstant]].
> (c environment bindingOf: sym) ifNotNil: [^#globalVar]]]
> ifFalse: [(environment bindingOf: sym) ifNotNil: [^#globalVar]]].
> ^self resolvePartial: aString!
>
> Item was changed:
> ----- Method: SHParserST80>>scanIdentifier (in category 'scan') -----
> scanIdentifier
> | c start |
> start := sourcePosition.
> + [(encodedCharSet isAlphaNumeric: (c := self nextChar)) or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
> - [(c := self nextChar) isAlphaNumeric or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
> (c == $: and: [(self isSelectorCharacter: self peekChar) not])
> ifTrue: [self nextChar].
> currentToken := source copyFrom: start to: sourcePosition - 1.
> currentTokenSourcePosition := start!
>
> Item was changed:
> ----- Method: SHParserST80>>scanNext (in category 'scan') -----
> scanNext
> self scanWhitespace.
> currentTokenFirst := self currentChar.
> currentTokenFirst
> ifNil: [" end of input "
> currentTokenFirst := $ .
> currentTokenSourcePosition := nil.
> currentToken := nil.
> ^nil].
> + (encodedCharSet isDigit: currentTokenFirst) ifTrue: [^self scanNumber].
> + ((encodedCharSet isLetter: currentTokenFirst) or: [
> - currentTokenFirst isDigit ifTrue: [^self scanNumber].
> - (currentTokenFirst isLetter or: [
> currentTokenFirst == $_ and: [ self allowUnderscoreSelectors ] ])
> ifTrue: [^self scanIdentifier].
> ^self scanBinary!
>
> Item was changed:
> ----- Method: SHParserST80>>scanNumber (in category 'scan') -----
> scanNumber
> | start c nc base |
> start := sourcePosition.
> self skipDigits.
> c := self currentChar.
> c == $r
> ifTrue: [
> base := Integer readFrom: (ReadStream on: (source copyFrom: start to: sourcePosition - 1)).
> self peekChar == $- ifTrue:[self nextChar].
> self skipBigDigits: base.
> c := self currentChar.
> c == $.
> ifTrue: [
> (self isBigDigit: self nextChar base: base)
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipBigDigits: base]].
> c := self currentChar.
> ('deq'includes: c)
> ifTrue: [
> + ((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $- and:[(encodedCharSet isDigit: self peekChar)]])
> - ((nc := self nextChar) isDigit or: [nc == $- and:[self peekChar isDigit]])
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipDigits]].
> c == $s
> ifTrue: [
> + (encodedCharSet isDigit: self nextChar)
> - self nextChar isDigit
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipDigits]].
> currentToken := source copyFrom: start to: sourcePosition - 1.
> ^currentTokenSourcePosition := start].
> c == $s
> ifTrue: [
> + (encodedCharSet isDigit: self nextChar)
> - self nextChar isDigit
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipDigits.].
> currentToken := source copyFrom: start to: sourcePosition - 1.
> ^currentTokenSourcePosition := start].
> c == $.
> ifTrue: [
> + (encodedCharSet isDigit: self nextChar)
> - self nextChar isDigit
> ifFalse: [
> sourcePosition := sourcePosition - 1.
> currentToken := source copyFrom: start to: sourcePosition - 1.
> ^currentTokenSourcePosition := start]
> ifTrue: [self skipDigits]].
> c := self currentChar.
> ('deq' includes: c)
> ifTrue: [
> + ((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $- and:[(encodedCharSet isDigit: self peekChar)]])
> - ((nc := self nextChar) isDigit or: [nc == $- and:[self peekChar isDigit]])
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipDigits]].
> c == $s
> ifTrue: [
> + (encodedCharSet isDigit: self nextChar)
> - self nextChar isDigit
> ifFalse: [sourcePosition := sourcePosition - 1]
> ifTrue: [self skipDigits]].
> currentToken := source copyFrom: start to: sourcePosition - 1.
> ^currentTokenSourcePosition := start!
>
> Item was changed:
> ----- Method: SHParserST80>>scanPast: (in category 'scan') -----
> scanPast: rangeType
> "record rangeType for current token .
> record argument and temp declarations.
> scan and answer the next token"
> + rangeType == #blockPatternArg ifTrue: [self pushArgument: currentToken].
> + rangeType == #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
> + rangeType == #patternArg ifTrue: [self pushArgument: currentToken].
> + rangeType == #patternTempVar ifTrue: [self pushTemporary: currentToken].
> - rangeType = #blockPatternArg ifTrue: [self pushArgument: currentToken].
> - rangeType = #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
> - rangeType = #patternArg ifTrue: [self pushArgument: currentToken].
> - rangeType = #patternTempVar ifTrue: [self pushTemporary: currentToken].
> ^self
> rangeType: rangeType;
> scanNext!
>
> Item was changed:
> ----- Method: SHParserST80>>skipDigits (in category 'scan') -----
> skipDigits
> +
> + [ (encodedCharSet isDigit: self nextChar) ] whileTrue!
> - [self nextChar isDigit]
> - whileTrue: []!
>
> Item was changed:
> ----- Method: SHParserST80>>source: (in category 'accessing') -----
> source: aString
> +
> + source := aString.
> + encodedCharSet := EncodedCharSet charsetAt: (source at: 1 ifAbsent: [ $0 ]) leadingChar
> + !
> - source := aString!
>
>
>
More information about the Squeak-dev
mailing list
|