[squeak-dev] The Inbox: ShoutCore-ul.41.mcz

Levente Uzonyi leves at elte.hu
Mon Feb 10 19:16:49 UTC 2014


Some of these optimizations are not nice, nor necessary for modern 
machines, but with these changes Shout can parse ~50% more methods 
according to the benchmark below.

| p sources |
sources := OrderedCollection new.
CurrentReadOnlySourceFiles cacheDuring: [
 	Smalltalk allClassesDo: [ :class |
 		class selectors collect: [ :selector |
 			sources add: class -> (class >> selector) getSource asString ] ] ].
p := SHParserST80 new.
{
 	[
 	| previousClass |
 	previousClass := nil.
 	sources do: [ :source |
 		| class |
 		class := source key.
 		previousClass == class ifFalse: [
 			p classOrMetaClass: source key ].
 		p
 			source: source value;
 			parse;
 			ranges ] ] timeToRun.
 	sources size }

My pc can parse 8 methods per millisecond using Cog, which might be enough 
for Shout to become usable on a really slow machine like the Raspberry Pi.


Levente

On Mon, 10 Feb 2014, commits at source.squeak.org wrote:

> A new version of ShoutCore was added to project The Inbox:
> http://source.squeak.org/inbox/ShoutCore-ul.41.mcz
>
> ==================== Summary ====================
>
> Name: ShoutCore-ul.41
> Author: ul
> Time: 10 February 2014, 6:48:33.606 pm
> UUID: b71b4d71-9420-4b18-a82a-dc2842617797
> Ancestors: ShoutCore-cwp.40
>
> SHParserST80 optimizations:
> - don't collect 'answer' in #parseString, because it's very slow, and the result is never used
> - assume that the whole source uses the same EncodedCharSet. Cache it in the parser and use it to decide if a character is alphanumeric, a letter or a digit. (requires Multilingual-ul.195)
> - avoid evaluating blocks in #currentChar and #peekChar
> - use symbols for comparison wherever possible
> - unrolled some loops
>
> =============== Diff against ShoutCore-cwp.40 ===============
>
> Item was changed:
>  Object subclass: #SHParserST80
> + 	instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment encodedCharSet'
> - 	instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment'
>  	classVariableNames: ''
>  	poolDictionaries: ''
>  	category: 'ShoutCore-Parsing'!
>
>  !SHParserST80 commentStamp: 'tween 8/16/2004 15:44' prior: 0!
>  I am a Smalltalk method / expression parser.
>
>  Rather than creating an Abstract Syntax Tree, I create a sequence of SHRanges (in my 'ranges' instance variable), which represent the tokens within the String I am parsing.
>
>  I am used by a SHTextStylerST80 to parse method source strings.
>  I am able to parse incomplete / incorrect methods, and so can be used to parse methods that are being edited.
>
>  My 'source' instance variable should be set to the string to be parsed.
>
>  My 'classOrMetaClass' instance var must be set to the class or metaClass for the method source so that I can correctly resolve identifiers within the source. If this is nil , I parse the source as an expression (i.e. a doIt expression).
>
>  My 'workspace' instance variable can be set to a Workspace, so that I can resolve workspace variables.
>
>  My 'environment' instance variable is the global namespace (this is initialized to Smalltalk, but can be set to a different environment).
>
>  Example 1.
>  	ranges := SHParserST80 new
>  		classOrMetaClass: Object;
>  		source: 'testMethod ^self';
>  		parse;
>  		ranges
>
>  !
>
> Item was changed:
>  ----- Method: SHParserST80>>currentChar (in category 'scan') -----
>  currentChar
> + 	^source at: sourcePosition ifAbsent: nil!
> - 	^source at: sourcePosition ifAbsent: [nil]!
>
> Item was changed:
>  ----- Method: SHParserST80>>isKeyword (in category 'token testing') -----
>  isKeyword
>  	"This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
>
> + 	^((encodedCharSet isLetter: currentTokenFirst) or: [
> - 	^(currentTokenFirst isLetter or: [
>  		currentTokenFirst == $_ and: [
>  			currentToken notNil and: [
>  			currentToken size > 1 and: [
>  			self allowUnderscoreSelectors ] ] ] ]) and: [
>  		currentToken last == $: ]!
>
> Item was changed:
>  ----- Method: SHParserST80>>isName (in category 'token testing') -----
>  isName
>  	"This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
>
> + 	^((encodedCharSet isLetter: currentTokenFirst)
> + 		or: [ currentTokenFirst == $_
> + 			and: [ currentToken notNil
> + 			and: [ currentToken size > 1
> + 			and: [ self allowUnderscoreSelectors ] ] ] ])
> + 		and: [ (encodedCharSet isAlphaNumeric: currentToken last)
> + 			or: [ currentToken last == $_
> + 				and: [ self allowUnderscoreSelectors ] ] ] !
> - 	^(currentTokenFirst isLetter or: [
> - 		currentTokenFirst == $_ and: [
> - 			currentToken notNil and: [
> - 			currentToken size > 1 and: [
> - 			self allowUnderscoreSelectors ] ] ] ]) and: [
> - 		currentToken last isAlphaNumeric or: [
> - 			currentToken last == $_ and: [
> - 				self allowUnderscoreSelectors ] ] ] !
>
> Item was changed:
>  ----- Method: SHParserST80>>isSelectorCharacter: (in category 'character testing') -----
>  isSelectorCharacter: aCharacter
>
> + 	(encodedCharSet isAlphaNumeric: aCharacter) ifTrue: [^false].
> - 	aCharacter isAlphaNumeric ifTrue: [^false].
>  	aCharacter isSeparator ifTrue:[^false].
>  	('"#$'':().;[]{}^_'  includes: aCharacter)
>  		ifTrue:[^false].
>  	aCharacter asciiValue = 30 ifTrue: [^false "the doIt char"].
>  	aCharacter asciiValue = 0 ifTrue: [^false].
>  	"Any other char is ok as a binary selector char."
>  	^true
>  !
>
> Item was changed:
>  ----- Method: SHParserST80>>parseByteArray (in category 'parse') -----
>  parseByteArray
>  	[currentTokenFirst == $]] whileFalse: [
> + 		(encodedCharSet isDigit: currentTokenFirst)
> - 		currentTokenFirst isDigit
>  			ifTrue: [
>  				"do not parse the number, can be time consuming"
>  				self scanPast: #number]
>  			ifFalse: [
>  				self failWhen: currentTokenFirst == $. .
>  				self error]].
>  	self scanPast: #byteArrayEnd!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseExternalCall (in category 'parse') -----
>  parseExternalCall
>  	[self scanNext.
>  	((Smalltalk at: #ExternalFunction) callingConventionModifierFor: currentToken) notNil]
>  		whileTrue.
>  	self failUnless: currentToken notNil.
>  	self scanPast: #externalCallType.
>  	currentToken = '*'
>  		ifTrue: [self scanPast: #externalCallTypePointerIndicator].
> + 	(encodedCharSet isDigit: currentTokenFirst)
> - 	currentTokenFirst isDigit
>  		ifTrue: [self scanPast: #integer]
>  		ifFalse: 	[
>  			self failUnless: currentTokenFirst == $'.
>  			self parseString].
>  	self failUnless: currentTokenFirst == $(.
>  	self scanPast: #leftParenthesis.
>  	[currentTokenFirst ~= $)]
>  		whileTrue: [
>  			self failUnless: currentToken notNil.
>  			self scanPast: #externalCallType.
>  			currentToken = '*'
>  				ifTrue: [self scanPast: #externalCallTypePointerIndicator]].
>  	self scanPast: #rightParenthesis.
>  	currentToken = 'module:'
>  		ifTrue: [
>  			self scanPast: #module.
>  			self parseStringOrSymbol ].
>  	currentToken = 'error:' ifTrue: [
>  		self scanPast: #primitive. "there's no rangeType for error"
>  		self isName
>  			ifTrue: [ self scanPast: #patternTempVar ]
>  			ifFalse: [ self parseStringOrSymbol ] ].
>  	self failUnless: currentToken = '>'.
>  	self scanPast: #primitiveOrExternalCallEnd!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseLiteral: (in category 'parse') -----
>  parseLiteral: inArray
>  	currentTokenFirst == $$
>  		ifTrue: [
>  			| pos |
>  			self failWhen: self currentChar isNil.
>  			self rangeType: #'$'.
>  			pos := currentTokenSourcePosition + 1.
>  			self nextChar.
>  			^self scanPast: #character start: pos end: pos].
> + 	(encodedCharSet isDigit: currentTokenFirst)
> - 	currentTokenFirst isDigit
>  		ifTrue: [
>  			"do not parse the number, can be time consuming"
>  			^self scanPast: #number].
>  	currentToken = '-'
>  		ifTrue: [
>  			| c |
>  			c := self currentChar.
> + 			(inArray and: [c isNil or: [(encodedCharSet isDigit: c) not]])
> - 			(inArray and: [c isNil or: [c isDigit not]])
>  				ifTrue: [
>  					"single - can be a symbol in an Array"
>  					^self scanPast: #symbol].
>  			self scanPast: #-.
>  			self failWhen: currentToken isNil.
>  			"token isNil ifTrue: [self error: 'Unexpected End Of Input']."
>  			"do not parse the number, can be time consuming"
>  			^self scanPast: #number].
>  	currentTokenFirst == $' ifTrue: [^self parseString].
>  	currentTokenFirst == $# ifTrue: [^self parseSymbol].
>  	(inArray and: [currentToken notNil]) ifTrue: [^self scanPast: #symbol].
>  	self failWhen: currentTokenFirst == $. .
>  	self error	": 'argument missing'"!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseLiteralArrayElement (in category 'parse') -----
>  parseLiteralArrayElement
> + 	(encodedCharSet isLetter: currentTokenFirst)
> - 	currentTokenFirst isLetter
>  		ifTrue: [
> + 			#true = currentToken ifTrue: [ ^self scanPast: #true ].
> + 			#false = currentToken ifTrue: [ ^self scanPast: #false ].
> + 			#nil = currentToken ifTrue: [ ^self scanPast: #nil ].
> + 			^self scanPast: #symbol ].
> - 			| type |
> - 			type := (#('true' 'false' 'nil') includes: currentToken)
> - 				ifTrue: [currentToken asSymbol]
> - 				ifFalse: [#symbol].
> - 			^self scanPast: type].
>  	currentTokenFirst == $(
>  		ifTrue: [
>  			self scanPast: #arrayStart.
>  			^self parseArray].
>  	^self parseLiteral: true!
>
> Item was changed:
>  ----- Method: SHParserST80>>parsePrimitive (in category 'parse') -----
>  parsePrimitive
>
>  	self scanNext.
> + 	(encodedCharSet isDigit: currentTokenFirst)
> - 	currentTokenFirst isDigit
>  		ifTrue: [ self scanPast: #integer ]
>  		ifFalse: [
>  			self parseStringOrSymbol.
>  			currentToken = 'module:' ifTrue: [
>  				self scanPast: #module.
>  				self parseStringOrSymbol ] ].
>  	currentToken = 'error:' ifTrue: [
>  		self scanPast: #primitive. "there's no rangeType for error"
>  		self isName
>  			ifTrue: [ self scanPast: #patternTempVar ]
>  			ifFalse: [ self parseStringOrSymbol ] ].
>  	self failUnless: currentToken = '>'.
>  	self scanPast: #primitiveOrExternalCallEnd!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseString (in category 'parse') -----
>  parseString
> + 	| first c last |
> - 	| first c answer last |
>  	first := sourcePosition.
> - 	answer := ''.
>
>  	[(c := self currentChar)
>  		ifNil: [
>  			self rangeType: #unfinishedString start: first - 1 end: source size.
>  			self error	": 'unfinished string'"].
> + 	c ~~ $' or: [
> - 	(c == $'
> - 		ifFalse: [answer := answer copyWith: c. true]
> - 		ifTrue: [false]
> - 	) or: [
>  		self peekChar == $'
>  			ifTrue: [
>  				sourcePosition := sourcePosition + 1.
> - 				answer := answer copyWith: $'.
>  				true]
>  			ifFalse: [false]]
>  	] whileTrue: [sourcePosition := sourcePosition + 1].
>  	last := sourcePosition.
>  	self nextChar.
> + 	self scanPast: #string start: first - 1 end: last!
> - 	self scanPast: #string start: first - 1 end: last.
> - 	^answer!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseSymbol (in category 'parse') -----
>  parseSymbol
>  	| c |
>  	currentToken = '#'
>  		ifTrue: [
>  			"if token is just the #, then scan whitespace and comments
>  			and then process the next character.
>  			Squeak allows space between the # and the start of the symbol
>  			e.g. # (),  #  a, #  'sym' "
>  			self rangeType: #symbol.
>  			self scanWhitespace].
>  	c := self currentChar.
>  	self failWhen: (c isNil or: [c isSeparator]).
>  	c == $(
>  		ifTrue: [
>  			self nextChar.
>  			self scanPast: #arrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
>  			^self parseArray].
>  	c == $' ifTrue: [^self parseSymbolString].
>  	c == $[ ifTrue: [
>  			self nextChar.
>  			self scanPast: #byteArrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
>  			^self parseByteArray].
>  	((self isSelectorCharacter: c) or: [c == $-])
>  		ifTrue: [^self parseSymbolSelector].
> + 	((encodedCharSet isLetter: c) or: [
> - 	(c isLetter or: [
>  		c == $: or: [
>  		c == $_ and: [self allowUnderscoreSelectors] ] ])
>  			ifTrue: [^self parseSymbolIdentifier].
>  	^self parseCharSymbol!
>
> Item was changed:
>  ----- Method: SHParserST80>>parseSymbolIdentifier (in category 'parse') -----
>  parseSymbolIdentifier
>  	| c start end |
>  	c := self currentChar.
> + 	self failUnless: ((encodedCharSet isLetter: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
> - 	self failUnless: (c isLetter or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
>  	start := sourcePosition.
>  	[c := self nextChar.
> + 	(encodedCharSet isAlphaNumeric: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
> - 	c isAlphaNumeric or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
>  	end := sourcePosition - 1.
>  	c := source copyFrom: start - 1 to: end.
>  	self scanPast: #symbol start: start - 1 end: end.
>  	^c!
>
> Item was changed:
>  ----- Method: SHParserST80>>peekChar (in category 'scan') -----
>  peekChar
> + 	^source at: sourcePosition + 1 ifAbsent: $ !
> - 	^source at: sourcePosition + 1 ifAbsent: [$ ]!
>
> Item was changed:
>  ----- Method: SHParserST80>>resolve: (in category 'identifier testing') -----
> + resolve: aString
> +
> + 	aString = #self ifTrue: [ ^#self ].
> + 	aString = #true ifTrue: [ ^#true ].
> + 	aString = #false ifTrue: [ ^#false ].
> + 	aString = #nil ifTrue: [ ^#nil ].
> + 	aString = #super ifTrue: [ ^#super ].
> + 	aString = #thisContext ifTrue: [ ^#thisContext ].
> - resolve: aString
> - 	(#('self' 'super' 'true' 'false' 'nil' 'thisContext') includes: aString)
> - 		ifTrue: [^aString asSymbol].
>  	(self isBlockTempName: aString) ifTrue: [^#blockTempVar].
>  	(self isBlockArgName: aString) ifTrue: [^#blockArg].
>  	(self isMethodTempName: aString) ifTrue: [^#tempVar].
>  	(self isMethodArgName: aString) ifTrue: [^#methodArg].
>  	(instanceVariables includes: aString) ifTrue: [^#instVar].
>  	workspace
>  		ifNotNil: [(workspace hasBindingOf: aString) ifTrue: [^#workspaceVar]].
>  	Symbol hasInterned: aString ifTrue: [:sym |
>  		classOrMetaClass isBehavior
>  			ifTrue: [
>  				classOrMetaClass theNonMetaClass withAllSuperclasses do: [:c |
>  					(c classPool bindingOf: sym) ifNotNil: [^#classVar].
>  					c sharedPools do: [:p | (p bindingOf: sym) ifNotNil: [^#poolConstant]].
>  					(c environment bindingOf: sym) ifNotNil: [^#globalVar]]]
>  			ifFalse: [(environment bindingOf: sym) ifNotNil: [^#globalVar]]].
>  	^self resolvePartial: aString!
>
> Item was changed:
>  ----- Method: SHParserST80>>scanIdentifier (in category 'scan') -----
>  scanIdentifier
>  	| c start |
>  	start := sourcePosition.
> + 	[(encodedCharSet isAlphaNumeric: (c := self nextChar)) or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
> - 	[(c := self nextChar) isAlphaNumeric or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
>  	(c == $: and: [(self isSelectorCharacter: self peekChar) not])
>  		ifTrue: [self nextChar].
>  	currentToken := source copyFrom: start to: sourcePosition - 1.
>  	currentTokenSourcePosition := start!
>
> Item was changed:
>  ----- Method: SHParserST80>>scanNext (in category 'scan') -----
>  scanNext
>  	self scanWhitespace.
>  	currentTokenFirst := self currentChar.
>  	currentTokenFirst
>  		ifNil: [" end of input "
>  			currentTokenFirst := $ .
>  			currentTokenSourcePosition := nil.
>  			currentToken := nil.
>  			^nil].
> + 	(encodedCharSet isDigit: currentTokenFirst) ifTrue: [^self scanNumber].
> + 	((encodedCharSet isLetter: currentTokenFirst) or: [
> - 	currentTokenFirst isDigit ifTrue: [^self scanNumber].
> - 	(currentTokenFirst isLetter or: [
>  		currentTokenFirst == $_ and: [ self allowUnderscoreSelectors ] ])
>  			ifTrue: [^self scanIdentifier].
>  	^self scanBinary!
>
> Item was changed:
>  ----- Method: SHParserST80>>scanNumber (in category 'scan') -----
>  scanNumber
>  	| start c nc base |
>  	start := sourcePosition.
>  	self skipDigits.
>  	c := self currentChar.
>  	c == $r
>  		ifTrue: [
>  			base := Integer readFrom: (ReadStream on: (source copyFrom: start to: sourcePosition - 1)).
>  			self peekChar == $- ifTrue:[self nextChar].
>  			self skipBigDigits: base.
>  			c := self currentChar.
>  			c == $.
>  				ifTrue: [
>  					(self isBigDigit: self nextChar base: base)
>  						ifFalse: [sourcePosition := sourcePosition - 1]
>  						ifTrue: [self skipBigDigits: base]].
>  			c := self currentChar.
>  			('deq'includes: c)
>  				ifTrue: [
> + 					((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $- and:[(encodedCharSet isDigit: self peekChar)]])
> - 					((nc := self nextChar) isDigit or: [nc == $- and:[self peekChar isDigit]])
>  						ifFalse: [sourcePosition := sourcePosition - 1]
>  						ifTrue: [self skipDigits]].
>  			c == $s
>  				ifTrue: [
> + 					(encodedCharSet isDigit: self nextChar)
> - 					self nextChar isDigit
>  						ifFalse: [sourcePosition := sourcePosition - 1]
>  						ifTrue: [self skipDigits]].
>  			currentToken := source copyFrom: start to: sourcePosition - 1.
>  			^currentTokenSourcePosition := start].
>  	c == $s
>  		ifTrue: [
> + 			(encodedCharSet isDigit: self nextChar)
> - 			self nextChar isDigit
>  				ifFalse: [sourcePosition := sourcePosition - 1]
>  				ifTrue: [self skipDigits.].
>  			currentToken := source copyFrom: start to: sourcePosition - 1.
>  			^currentTokenSourcePosition := start].
>  	c == $.
>  		ifTrue: [
> + 			(encodedCharSet isDigit: self nextChar)
> - 			self nextChar isDigit
>  				ifFalse: [
>  					sourcePosition := sourcePosition - 1.
>  					currentToken := source copyFrom: start to: sourcePosition - 1.
>  					^currentTokenSourcePosition := start]
>  				ifTrue: [self skipDigits]].
>  	c := self currentChar.
>  	('deq' includes: c)
>  		ifTrue: [
> + 			((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $-  and:[(encodedCharSet isDigit: self peekChar)]])
> - 			((nc := self nextChar) isDigit or: [nc == $-  and:[self peekChar isDigit]])
>  				ifFalse: [sourcePosition := sourcePosition - 1]
>  				ifTrue: [self skipDigits]].
>  	c == $s
>  		ifTrue: [
> + 			(encodedCharSet isDigit: self nextChar)
> - 			self nextChar isDigit
>  				ifFalse: [sourcePosition := sourcePosition - 1]
>  				ifTrue: [self skipDigits]].
>  	currentToken := source copyFrom: start to: sourcePosition - 1.
>  	^currentTokenSourcePosition := start!
>
> Item was changed:
>  ----- Method: SHParserST80>>scanPast: (in category 'scan') -----
>  scanPast: rangeType
>  	"record rangeType for current token .
>  	record argument and temp declarations.
>  	scan and answer the next token"
> + 	rangeType == #blockPatternArg ifTrue: [self pushArgument: currentToken].
> + 	rangeType == #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
> + 	rangeType == #patternArg ifTrue: [self pushArgument: currentToken].
> + 	rangeType == #patternTempVar ifTrue: [self pushTemporary: currentToken].
> - 	rangeType = #blockPatternArg ifTrue: [self pushArgument: currentToken].
> - 	rangeType = #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
> - 	rangeType = #patternArg ifTrue: [self pushArgument: currentToken].
> - 	rangeType = #patternTempVar ifTrue: [self pushTemporary: currentToken].
>  	^self
>  		rangeType: rangeType;
>  		scanNext!
>
> Item was changed:
>  ----- Method: SHParserST80>>skipDigits (in category 'scan') -----
>  skipDigits
> +
> + 	[ (encodedCharSet isDigit: self nextChar) ] whileTrue!
> - 	[self nextChar isDigit]
> - 		whileTrue: []!
>
> Item was changed:
>  ----- Method: SHParserST80>>source: (in category 'accessing') -----
>  source: aString
> +
> + 	source := aString.
> + 	encodedCharSet := EncodedCharSet charsetAt: (source at: 1 ifAbsent: [ $0 ]) leadingChar
> + 	!
> -     source := aString!
>
>
>


More information about the Squeak-dev mailing list