[squeak-dev] The Inbox: ShoutCore-ul.41.mcz

commits at source.squeak.org commits at source.squeak.org
Mon Feb 10 19:03:51 UTC 2014


A new version of ShoutCore was added to project The Inbox:
http://source.squeak.org/inbox/ShoutCore-ul.41.mcz

==================== Summary ====================

Name: ShoutCore-ul.41
Author: ul
Time: 10 February 2014, 6:48:33.606 pm
UUID: b71b4d71-9420-4b18-a82a-dc2842617797
Ancestors: ShoutCore-cwp.40

SHParserST80 optimizations:
- don't collect 'answer' in #parseString, because it's very slow, and the result is never used
- assume that the whole source uses the same EncodedCharSet. Cache it in the parser and use it to decide if a character is alphanumeric, a letter or a digit. (requires Multilingual-ul.195)
- avoid evaluating blocks in #currentChar and #peekChar
- use symbols for comparison wherever possible
- unrolled some loops

=============== Diff against ShoutCore-cwp.40 ===============

Item was changed:
  Object subclass: #SHParserST80
+ 	instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment encodedCharSet'
- 	instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition blockDepth bracketDepth ranges environment'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'ShoutCore-Parsing'!
  
  !SHParserST80 commentStamp: 'tween 8/16/2004 15:44' prior: 0!
  I am a Smalltalk method / expression parser.
  
  Rather than creating an Abstract Syntax Tree, I create a sequence of SHRanges (in my 'ranges' instance variable), which represent the tokens within the String I am parsing.
  
  I am used by a SHTextStylerST80 to parse method source strings.
  I am able to parse incomplete / incorrect methods, and so can be used to parse methods that are being edited.
  
  My 'source' instance variable should be set to the string to be parsed.
  
  My 'classOrMetaClass' instance var must be set to the class or metaClass for the method source so that I can correctly resolve identifiers within the source. If this is nil , I parse the source as an expression (i.e. a doIt expression).
  
  My 'workspace' instance variable can be set to a Workspace, so that I can resolve workspace variables.
  
  My 'environment' instance variable is the global namespace (this is initialized to Smalltalk, but can be set to a different environment).
  
  Example 1.
  	ranges := SHParserST80 new
  		classOrMetaClass: Object;
  		source: 'testMethod ^self';
  		parse;
  		ranges
  		
  !

Item was changed:
  ----- Method: SHParserST80>>currentChar (in category 'scan') -----
  currentChar
+ 	^source at: sourcePosition ifAbsent: nil!
- 	^source at: sourcePosition ifAbsent: [nil]!

Item was changed:
  ----- Method: SHParserST80>>isKeyword (in category 'token testing') -----
  isKeyword
  	"This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
  
+ 	^((encodedCharSet isLetter: currentTokenFirst) or: [
- 	^(currentTokenFirst isLetter or: [
  		currentTokenFirst == $_ and: [
  			currentToken notNil and: [
  			currentToken size > 1 and: [
  			self allowUnderscoreSelectors ] ] ] ]) and: [
  		currentToken last == $: ]!

Item was changed:
  ----- Method: SHParserST80>>isName (in category 'token testing') -----
  isName
  	"This method assumes that currentTokenFirst is a non-letter character when currentToken is nil."
  
+ 	^((encodedCharSet isLetter: currentTokenFirst)
+ 		or: [ currentTokenFirst == $_ 
+ 			and: [ currentToken notNil
+ 			and: [ currentToken size > 1
+ 			and: [ self allowUnderscoreSelectors ] ] ] ]) 
+ 		and: [ (encodedCharSet isAlphaNumeric: currentToken last)
+ 			or: [ currentToken last == $_
+ 				and: [ self allowUnderscoreSelectors ] ] ] !
- 	^(currentTokenFirst isLetter or: [ 
- 		currentTokenFirst == $_ and: [
- 			currentToken notNil and: [
- 			currentToken size > 1 and: [
- 			self allowUnderscoreSelectors ] ] ] ]) and: [
- 		currentToken last isAlphaNumeric or: [
- 			currentToken last == $_ and: [
- 				self allowUnderscoreSelectors ] ] ] !

Item was changed:
  ----- Method: SHParserST80>>isSelectorCharacter: (in category 'character testing') -----
  isSelectorCharacter: aCharacter
  
+ 	(encodedCharSet isAlphaNumeric: aCharacter) ifTrue: [^false].
- 	aCharacter isAlphaNumeric ifTrue: [^false].
  	aCharacter isSeparator ifTrue:[^false].
  	('"#$'':().;[]{}^_'  includes: aCharacter) 
  		ifTrue:[^false].
  	aCharacter asciiValue = 30 ifTrue: [^false "the doIt char"].
  	aCharacter asciiValue = 0 ifTrue: [^false].
  	"Any other char is ok as a binary selector char."
  	^true
  !

Item was changed:
  ----- Method: SHParserST80>>parseByteArray (in category 'parse') -----
  parseByteArray
  	[currentTokenFirst == $]] whileFalse: [
+ 		(encodedCharSet isDigit: currentTokenFirst)
- 		currentTokenFirst isDigit 
  			ifTrue: [
  				"do not parse the number, can be time consuming"
  				self scanPast: #number]
  			ifFalse: [
  				self failWhen: currentTokenFirst == $. .
  				self error]].
  	self scanPast: #byteArrayEnd!

Item was changed:
  ----- Method: SHParserST80>>parseExternalCall (in category 'parse') -----
  parseExternalCall
  	[self scanNext.
  	((Smalltalk at: #ExternalFunction) callingConventionModifierFor: currentToken) notNil]
  		whileTrue.
  	self failUnless: currentToken notNil.
  	self scanPast: #externalCallType.
  	currentToken = '*' 
  		ifTrue: [self scanPast: #externalCallTypePointerIndicator].
+ 	(encodedCharSet isDigit: currentTokenFirst)
- 	currentTokenFirst isDigit 
  		ifTrue: [self scanPast: #integer]
  		ifFalse: 	[
  			self failUnless: currentTokenFirst == $'.
  			self parseString].
  	self failUnless: currentTokenFirst == $(.
  	self scanPast: #leftParenthesis.
  	[currentTokenFirst ~= $)] 
  		whileTrue: [
  			self failUnless: currentToken notNil.
  			self scanPast: #externalCallType.
  			currentToken = '*' 
  				ifTrue: [self scanPast: #externalCallTypePointerIndicator]].
  	self scanPast: #rightParenthesis.
  	currentToken = 'module:' 
  		ifTrue: [
  			self scanPast: #module.
  			self parseStringOrSymbol ].
  	currentToken = 'error:' ifTrue: [
  		self scanPast: #primitive. "there's no rangeType for error"
  		self isName
  			ifTrue: [ self scanPast: #patternTempVar ]
  			ifFalse: [ self parseStringOrSymbol ] ].
  	self failUnless: currentToken = '>'.
  	self scanPast: #primitiveOrExternalCallEnd!

Item was changed:
  ----- Method: SHParserST80>>parseLiteral: (in category 'parse') -----
  parseLiteral: inArray 
  	currentTokenFirst == $$ 
  		ifTrue: [
  			| pos |
  			self failWhen: self currentChar isNil.
  			self rangeType: #'$'.
  			pos := currentTokenSourcePosition + 1.
  			self nextChar.
  			^self scanPast: #character start: pos end: pos].
+ 	(encodedCharSet isDigit: currentTokenFirst) 
- 	currentTokenFirst isDigit 
  		ifTrue: [
  			"do not parse the number, can be time consuming"
  			^self scanPast: #number].
  	currentToken = '-' 
  		ifTrue: [
  			| c |
  			c := self currentChar.
+ 			(inArray and: [c isNil or: [(encodedCharSet isDigit: c) not]]) 
- 			(inArray and: [c isNil or: [c isDigit not]]) 
  				ifTrue: [
  					"single - can be a symbol in an Array"
  					^self scanPast: #symbol].
  			self scanPast: #-.
  			self failWhen: currentToken isNil.
  			"token isNil ifTrue: [self error: 'Unexpected End Of Input']."
  			"do not parse the number, can be time consuming"
  			^self scanPast: #number].
  	currentTokenFirst == $' ifTrue: [^self parseString].
  	currentTokenFirst == $# ifTrue: [^self parseSymbol].
  	(inArray and: [currentToken notNil]) ifTrue: [^self scanPast: #symbol].
  	self failWhen: currentTokenFirst == $. .
  	self error	": 'argument missing'"!

Item was changed:
  ----- Method: SHParserST80>>parseLiteralArrayElement (in category 'parse') -----
  parseLiteralArrayElement
+ 	(encodedCharSet isLetter: currentTokenFirst)
- 	currentTokenFirst isLetter 
  		ifTrue: [
+ 			#true = currentToken ifTrue: [ ^self scanPast: #true ].
+ 			#false = currentToken ifTrue: [ ^self scanPast: #false ].
+ 			#nil = currentToken ifTrue: [ ^self scanPast: #nil ].
+ 			^self scanPast: #symbol ].
- 			| type |
- 			type := (#('true' 'false' 'nil') includes: currentToken) 
- 				ifTrue: [currentToken asSymbol]
- 				ifFalse: [#symbol].
- 			^self scanPast: type].
  	currentTokenFirst == $( 
  		ifTrue: [
  			self scanPast: #arrayStart.
  			^self parseArray].
  	^self parseLiteral: true!

Item was changed:
  ----- Method: SHParserST80>>parsePrimitive (in category 'parse') -----
  parsePrimitive
  
  	self scanNext.
+ 	(encodedCharSet isDigit: currentTokenFirst)
- 	currentTokenFirst isDigit
  		ifTrue: [ self scanPast: #integer ]
  		ifFalse: [
  			self parseStringOrSymbol.
  			currentToken = 'module:' ifTrue: [
  				self scanPast: #module.
  				self parseStringOrSymbol ] ].
  	currentToken = 'error:' ifTrue: [
  		self scanPast: #primitive. "there's no rangeType for error"
  		self isName
  			ifTrue: [ self scanPast: #patternTempVar ]
  			ifFalse: [ self parseStringOrSymbol ] ].
  	self failUnless: currentToken = '>'.
  	self scanPast: #primitiveOrExternalCallEnd!

Item was changed:
  ----- Method: SHParserST80>>parseString (in category 'parse') -----
  parseString
+ 	| first c last |
- 	| first c answer last |
  	first := sourcePosition.
- 	answer := ''.
  	
  	[(c := self currentChar)
  		ifNil: [
  			self rangeType: #unfinishedString start: first - 1 end: source size.
  			self error	": 'unfinished string'"].
+ 	c ~~ $' or: [
- 	(c == $' 	
- 		ifFalse: [answer := answer copyWith: c. true] 
- 		ifTrue: [false]
- 	) or: [
  		self peekChar == $' 
  			ifTrue: [
  				sourcePosition := sourcePosition + 1.
- 				answer := answer copyWith: $'.
  				true]
  			ifFalse: [false]]
  	] whileTrue: [sourcePosition := sourcePosition + 1].
  	last := sourcePosition.
  	self nextChar.
+ 	self scanPast: #string start: first - 1 end: last!
- 	self scanPast: #string start: first - 1 end: last.
- 	^answer!

Item was changed:
  ----- Method: SHParserST80>>parseSymbol (in category 'parse') -----
  parseSymbol
  	| c |
  	currentToken = '#' 
  		ifTrue: [
  			"if token is just the #, then scan whitespace and comments
  			and then process the next character.
  			Squeak allows space between the # and the start of the symbol 
  			e.g. # (),  #  a, #  'sym' "
  			self rangeType: #symbol.
  			self scanWhitespace].
  	c := self currentChar.
  	self failWhen: (c isNil or: [c isSeparator]).
  	c == $( 
  		ifTrue: [
  			self nextChar.
  			self scanPast: #arrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
  			^self parseArray].
  	c == $' ifTrue: [^self parseSymbolString].
  	c == $[ ifTrue: [
  			self nextChar.
  			self scanPast: #byteArrayStart start: currentTokenSourcePosition end: currentTokenSourcePosition + 1.
  			^self parseByteArray].
  	((self isSelectorCharacter: c) or: [c == $-]) 
  		ifTrue: [^self parseSymbolSelector].
+ 	((encodedCharSet isLetter: c) or: [
- 	(c isLetter or: [
  		c == $: or: [ 
  		c == $_ and: [self allowUnderscoreSelectors] ] ]) 
  			ifTrue: [^self parseSymbolIdentifier].
  	^self parseCharSymbol!

Item was changed:
  ----- Method: SHParserST80>>parseSymbolIdentifier (in category 'parse') -----
  parseSymbolIdentifier
  	| c start end |
  	c := self currentChar.
+ 	self failUnless: ((encodedCharSet isLetter: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
- 	self failUnless: (c isLetter or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]).
  	start := sourcePosition.	
  	[c := self nextChar.
+ 	(encodedCharSet isAlphaNumeric: c) or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
- 	c isAlphaNumeric or: [c == $: or: [c == $_ and: [self allowUnderscoreSelectors]]]] whileTrue.
  	end := sourcePosition - 1.
  	c := source copyFrom: start - 1 to: end.
  	self scanPast: #symbol start: start - 1 end: end.
  	^c!

Item was changed:
  ----- Method: SHParserST80>>peekChar (in category 'scan') -----
  peekChar
+ 	^source at: sourcePosition + 1 ifAbsent: $ !
- 	^source at: sourcePosition + 1 ifAbsent: [$ ]!

Item was changed:
  ----- Method: SHParserST80>>resolve: (in category 'identifier testing') -----
+ resolve: aString
+ 
+ 	aString = #self ifTrue: [ ^#self ].
+ 	aString = #true ifTrue: [ ^#true ].
+ 	aString = #false ifTrue: [ ^#false ].
+ 	aString = #nil ifTrue: [ ^#nil ].
+ 	aString = #super ifTrue: [ ^#super ].
+ 	aString = #thisContext ifTrue: [ ^#thisContext ].
- resolve: aString 
- 	(#('self' 'super' 'true' 'false' 'nil' 'thisContext') includes: aString) 
- 		ifTrue: [^aString asSymbol].
  	(self isBlockTempName: aString) ifTrue: [^#blockTempVar].
  	(self isBlockArgName: aString) ifTrue: [^#blockArg].
  	(self isMethodTempName: aString) ifTrue: [^#tempVar].
  	(self isMethodArgName: aString) ifTrue: [^#methodArg].
  	(instanceVariables includes: aString) ifTrue: [^#instVar].
  	workspace 
  		ifNotNil: [(workspace hasBindingOf: aString) ifTrue: [^#workspaceVar]].
  	Symbol hasInterned: aString ifTrue: [:sym | 
  		classOrMetaClass isBehavior 
  			ifTrue: [
  				classOrMetaClass theNonMetaClass withAllSuperclasses do: [:c | 
  					(c classPool bindingOf: sym) ifNotNil: [^#classVar].
  					c sharedPools do: [:p | (p bindingOf: sym) ifNotNil: [^#poolConstant]].
  					(c environment bindingOf: sym) ifNotNil: [^#globalVar]]]
  			ifFalse: [(environment bindingOf: sym) ifNotNil: [^#globalVar]]].
  	^self resolvePartial: aString!

Item was changed:
  ----- Method: SHParserST80>>scanIdentifier (in category 'scan') -----
  scanIdentifier
  	| c start |
  	start := sourcePosition.
+ 	[(encodedCharSet isAlphaNumeric: (c := self nextChar)) or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
- 	[(c := self nextChar) isAlphaNumeric or: [c == $_ and: [self allowUnderscoreSelectors]]] whileTrue.
  	(c == $: and: [(self isSelectorCharacter: self peekChar) not]) 
  		ifTrue: [self nextChar].
  	currentToken := source copyFrom: start to: sourcePosition - 1.
  	currentTokenSourcePosition := start!

Item was changed:
  ----- Method: SHParserST80>>scanNext (in category 'scan') -----
  scanNext
  	self scanWhitespace.
  	currentTokenFirst := self currentChar.
  	currentTokenFirst 
  		ifNil: [" end of input "
  			currentTokenFirst := $ .
  			currentTokenSourcePosition := nil.
  			currentToken := nil.
  			^nil].
+ 	(encodedCharSet isDigit: currentTokenFirst) ifTrue: [^self scanNumber].
+ 	((encodedCharSet isLetter: currentTokenFirst) or: [
- 	currentTokenFirst isDigit ifTrue: [^self scanNumber].
- 	(currentTokenFirst isLetter or: [
  		currentTokenFirst == $_ and: [ self allowUnderscoreSelectors ] ])
  			ifTrue: [^self scanIdentifier].
  	^self scanBinary!

Item was changed:
  ----- Method: SHParserST80>>scanNumber (in category 'scan') -----
  scanNumber
  	| start c nc base |
  	start := sourcePosition.
  	self skipDigits.
  	c := self currentChar.
  	c == $r 
  		ifTrue: [
  			base := Integer readFrom: (ReadStream on: (source copyFrom: start to: sourcePosition - 1)).
  			self peekChar == $- ifTrue:[self nextChar].
  			self skipBigDigits: base.
  			c := self currentChar.
  			c == $. 
  				ifTrue: [
  					(self isBigDigit: self nextChar base: base) 
  						ifFalse: [sourcePosition := sourcePosition - 1]
  						ifTrue: [self skipBigDigits: base]].
  			c := self currentChar.
  			('deq'includes: c) 
  				ifTrue: [
+ 					((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $- and:[(encodedCharSet isDigit: self peekChar)]]) 
- 					((nc := self nextChar) isDigit or: [nc == $- and:[self peekChar isDigit]]) 
  						ifFalse: [sourcePosition := sourcePosition - 1]
  						ifTrue: [self skipDigits]].
  			c == $s 
  				ifTrue: [
+ 					(encodedCharSet isDigit: self nextChar)
- 					self nextChar isDigit 
  						ifFalse: [sourcePosition := sourcePosition - 1]
  						ifTrue: [self skipDigits]].
  			currentToken := source copyFrom: start to: sourcePosition - 1.
  			^currentTokenSourcePosition := start].
  	c == $s 
  		ifTrue: [
+ 			(encodedCharSet isDigit: self nextChar)
- 			self nextChar isDigit 
  				ifFalse: [sourcePosition := sourcePosition - 1]
  				ifTrue: [self skipDigits.].
  			currentToken := source copyFrom: start to: sourcePosition - 1.
  			^currentTokenSourcePosition := start].
  	c == $. 
  		ifTrue: [
+ 			(encodedCharSet isDigit: self nextChar)
- 			self nextChar isDigit 
  				ifFalse: [
  					sourcePosition := sourcePosition - 1.
  					currentToken := source copyFrom: start to: sourcePosition - 1.
  					^currentTokenSourcePosition := start]
  				ifTrue: [self skipDigits]].
  	c := self currentChar.
  	('deq' includes: c) 
  		ifTrue: [
+ 			((encodedCharSet isDigit: (nc := self nextChar)) or: [nc == $-  and:[(encodedCharSet isDigit: self peekChar)]]) 
- 			((nc := self nextChar) isDigit or: [nc == $-  and:[self peekChar isDigit]]) 
  				ifFalse: [sourcePosition := sourcePosition - 1]
  				ifTrue: [self skipDigits]].
  	c == $s 
  		ifTrue: [
+ 			(encodedCharSet isDigit: self nextChar)
- 			self nextChar isDigit 
  				ifFalse: [sourcePosition := sourcePosition - 1]
  				ifTrue: [self skipDigits]].
  	currentToken := source copyFrom: start to: sourcePosition - 1.
  	^currentTokenSourcePosition := start!

Item was changed:
  ----- Method: SHParserST80>>scanPast: (in category 'scan') -----
  scanPast: rangeType 
  	"record rangeType for current token .
  	record argument and temp declarations.
  	scan and answer the next token"
+ 	rangeType == #blockPatternArg ifTrue: [self pushArgument: currentToken].
+ 	rangeType == #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
+ 	rangeType == #patternArg ifTrue: [self pushArgument: currentToken].
+ 	rangeType == #patternTempVar ifTrue: [self pushTemporary: currentToken].
- 	rangeType = #blockPatternArg ifTrue: [self pushArgument: currentToken].
- 	rangeType = #blockPatternTempVar ifTrue: [self pushTemporary: currentToken].
- 	rangeType = #patternArg ifTrue: [self pushArgument: currentToken].
- 	rangeType = #patternTempVar ifTrue: [self pushTemporary: currentToken].
  	^self
  		rangeType: rangeType;
  		scanNext!

Item was changed:
  ----- Method: SHParserST80>>skipDigits (in category 'scan') -----
  skipDigits
+ 
+ 	[ (encodedCharSet isDigit: self nextChar) ] whileTrue!
- 	[self nextChar isDigit] 
- 		whileTrue: []!

Item was changed:
  ----- Method: SHParserST80>>source: (in category 'accessing') -----
  source: aString
+ 	
+ 	source := aString.
+ 	encodedCharSet := EncodedCharSet charsetAt: (source at: 1 ifAbsent: [ $0 ]) leadingChar
+ 	!
-     source := aString!



More information about the Squeak-dev mailing list