[Pkg] The Trunk: Compiler-nice.223.mcz

Thu Feb 9 15:05:46 UTC 2012

Nicolas Cellier uploaded a new version of Compiler to project The Trunk:
http://source.squeak.org/trunk/Compiler-nice.223.mcz

==================== Summary ====================

Name: Compiler-nice.223
Author: nice
Time: 9 February 2012, 4:05:21.21 pm
UUID: 8c21622a-6317-4c71-ad1a-607ba1683e6c
Ancestors: Compiler-nice.222

Correct a bug (see http://code.google.com/p/pharo/issues/detail?id=4650)

self should: [Compiler evaluate: '$'] raise: Error.

I also simplified all the redundancy (aheadChar == DoItCharacter and: [source atEnd]) and replaced with aheadChar == DoItCharacter.
Indeed, these were uggly incomplete guards trying to distinguish a DoItCharacter marking an end of stream from a (Character value: 30 "ASCII character RS means Record Separator") encountered in source...
But they would not even handle the case when DoItCharacter is the last source character., except maybe the uggliest contorsions in xDigit...

Instead I replaced the DoItCharacter marking the endOfStream by a character that we should never encounter in source.
I have chosen 16r10FFFF which is the last unicode and will never be used to encode a character (as all ending in FFFE and FFFF).
A different strategy would be to use a value greater than the last unicode, like 16r110000, and would also work...
Or use a different Object. In this later case, the object would have to understand charCode or we would have to change more Scanner methods (at least typeTableAt:).

Note that with current Character implementation, (Character value: 16r10FFFF) ~~ (Character value: 16r10FFFF).
Since all tests are written with identity test aheadChar or hereChar == DoItCharacter, even if such Character were encountered in source, it wouldn't be interpreted as an endOfStream mark, thus any Character code > 255 could have been used, but this would be more fragile.

Consequently, I also modified the character type table to not interpret (Character value: 30) as end of source (#doIt).
I think that it was previously possible to insert a (Character value: 30) in source, and everything after that would have been ignored (if not between quotes) and could potentially store meta information. But this was both undocumented and AFAIK unused. It's easy to go back if we want to, by restoring previous version of #initializeTypeTable.

While doing this, I noticed that (Character value: 30) will now be interpreted as binary as many other characters including invisible control characters...
It could thus be used in a binary selector! That's crazy and I suggest using xIllegal in the Character typeTable, since I had previously prepared this method...
But one change at a time...

=============== Diff against Compiler-nice.222 ===============

Item was changed:
  ----- Method: Parser>>advance (in category 'scanning') -----
  advance
  	| this |
  	prevMark := hereMark.
  	prevEnd := hereEnd.
  	this := here.
  	here := token.
  	hereType := tokenType.
  	hereMark := mark.
+ 	hereEnd := source position - (aheadChar == DoItCharacter
+ 		ifTrue: [hereChar == DoItCharacter
- 	hereEnd := source position - ((aheadChar == 30 asCharacter and: [source atEnd])
- 		ifTrue: [hereChar == 30 asCharacter
  			ifTrue: [0]
  			ifFalse: [1]]
  		ifFalse: [2]).
  	self scanToken.
  	"Transcript show: 'here: ', here printString, ' mark: ', hereMark printString, ' end: ', hereEnd printString; cr."
  	^this!

Item was changed:
  ----- Method: Scanner class>>initialize (in category 'initialization') -----
  initialize

  	self initializeTypeTable.
+ 	"The unicode ending with FFFE or FFFF are non characters and can be used by applications if they wish.
+ 	We use last legal unicode 16r10FFFF to encode the end of source stream"
+ 	DoItCharacter := Character value: 16r10FFFF!
- 	DoItCharacter := Character value: 30!

Item was changed:
  ----- Method: Scanner class>>initializeTypeTable (in category 'initialization') -----
  initializeTypeTable
  	"self initializeTypeTable"

  	| newTable |
  	newTable := Array new: 256 withAll: #xBinary. "default"
  	newTable atAll: #(9 10 12 13 32 ) put: #xDelimiter. "tab lf ff cr space"
  	newTable atAll: ($0 asciiValue to: $9 asciiValue) put: #xDigit.

  	1 to: 255
  		do: [:index |
  			(Character value: index) isLetter
  				ifTrue: [newTable at: index put: #xLetter]].

- 	newTable at: 30 put: #doIt.
  	newTable at: $" asciiValue put: #xDoubleQuote.
  	newTable at: $# asciiValue put: #xLitQuote.
  	newTable at: $$ asciiValue put: #xDollar.
  	newTable at: $' asciiValue put: #xSingleQuote.
  	newTable at: $: asciiValue put: #xColon.
  	newTable at: $( asciiValue put: #leftParenthesis.
  	newTable at: $) asciiValue put: #rightParenthesis.
  	newTable at: $. asciiValue put: #period.
  	newTable at: $; asciiValue put: #semicolon.
  	newTable at: $[ asciiValue put: #leftBracket.
  	newTable at: $] asciiValue put: #rightBracket.
  	newTable at: ${ asciiValue put: #leftBrace.
  	newTable at: $} asciiValue put: #rightBrace.
  	newTable at: $^ asciiValue put: #upArrow.
  	newTable at: $_ asciiValue put: #xUnderscore.
  	newTable at: $| asciiValue put: #verticalBar.
  	TypeTable := newTable "bon voyage!!"!

Item was changed:
  ----- Method: Scanner>>scanToken (in category 'expression types') -----
  scanToken

  	[(tokenType := self typeTableAt: hereChar) == #xDelimiter]
  		whileTrue: [self step].  "Skip delimiters fast, there almost always is one."
+ 	mark := aheadChar == DoItCharacter
- 	mark := (aheadChar == DoItCharacter and: [source atEnd])
  		ifTrue: [source position]
  		ifFalse: [source position - 1].
  	(tokenType at: 1) == $x "x as first letter"
  		ifTrue: [self perform: tokenType "means perform to compute token & type"]
  		ifFalse: [token := self step asSymbol "else just unique the first char"].
  	^token!

Item was changed:
  ----- Method: Scanner>>typeTableAt: (in category 'multi-character scans') -----
  typeTableAt: aCharacter
+ 	^typeTable
+ 		at: aCharacter charCode
+ 		ifAbsent:
+ 			[aCharacter == DoItCharacter
+ 				ifTrue: [#doIt]
+ 				ifFalse: [#xLetter]]!
- 	^typeTable at: aCharacter charCode ifAbsent:[#xLetter]!

Item was changed:
  ----- Method: Scanner>>xDigit (in category 'multi-character scans') -----
  xDigit
  	"Form a number."

  	tokenType := #number.
+ 	aheadChar == DoItCharacter
- 	(aheadChar == DoItCharacter and: [source atEnd
- 			and:  [source skip: -1. source next ~~ DoItCharacter]])
  		ifTrue: [source skip: -1 "Read off the end last time"]
  		ifFalse: [source skip: -2].
  	token := (SqNumberParser on: source)
  		failBlock: [:errorString :position | self notify: errorString at:position];
  		nextNumber.
  	self step; step!

Item was changed:
  ----- Method: Scanner>>xDollar (in category 'multi-character scans') -----
  xDollar
  	"Form a Character literal."

+ 	aheadChar == DoItCharacter ifTrue: [^self offEnd: 'A Character was expected'].
  	self step. "pass over $"
  	token := self step.
  	tokenType := #character!

Item was changed:
  ----- Method: Scanner>>xDoubleQuote (in category 'multi-character scans') -----
  xDoubleQuote
      "Collect a comment."
-     "wod 1/10/98: Allow 'empty' comments by testing the first character
- for $"" rather than blindly adding it to the comment being collected."

  	buffer reset.
  	self step.
  	[ hereChar == $" ] whileFalse: [
+ 		hereChar == DoItCharacter ifTrue: [
- 		(hereChar == DoItCharacter and: [ source atEnd ]) ifTrue: [
  			^self offEnd: 'Unmatched comment quote' ].
  		buffer nextPut: self step ].
  	self step.
  	(currentComment ifNil: [ 
  		currentComment := OrderedCollection new ])
  			add: buffer contents.
  	self scanToken!

Item was changed:
  ----- Method: Scanner>>xLetter (in category 'multi-character scans') -----
  xLetter
  	"Form a word or keyword."

  	| type |
  	buffer reset.
  	[(type := self typeTableAt: hereChar) == #xLetter
  		or: [type == #xDigit
  		or: [type == #xUnderscore and:[self allowUnderscoreSelectors]]]] whileTrue:
  			["open code step for speed"
  			buffer nextPut: hereChar.
  			hereChar := aheadChar.
  			aheadChar := source atEnd
+ 							ifTrue: [DoItCharacter]
- 							ifTrue: [30 asCharacter "doit"]
  							ifFalse: [source next]].
+ 	tokenType := (type == #xColon and: [aheadChar ~~ $=])
- 	tokenType := (type == #colon or: [type == #xColon and: [aheadChar ~~ $=]])
  					ifTrue: 
  						[buffer nextPut: self step.
  						"Allow any number of embedded colons in literal symbols"
  						[(self typeTableAt: hereChar) == #xColon] whileTrue:
  							[buffer nextPut: self step].
  						#keyword]
  					ifFalse: 
  						[#word].
  	token := buffer contents!

Item was changed:
  ----- Method: Scanner>>xSingleQuote (in category 'multi-character scans') -----
  xSingleQuote
  	"String."

  	self step.
  	buffer reset.
  	[hereChar == $' 
  		and: [aheadChar == $' 
  				ifTrue: [self step. false]
  				ifFalse: [true]]]
  		whileFalse: 
+ 			[hereChar == DoItCharacter
+ 				ifTrue: [^self offEnd: 'Unmatched string quote'].
+ 			buffer nextPut: self step].
- 			[buffer nextPut: self step.
- 			(hereChar == DoItCharacter and: [source atEnd])
- 				ifTrue: [^self offEnd: 'Unmatched string quote']].
  	self step.
  	token := buffer contents.
  	tokenType := #string!