[squeak-dev] The Trunk: XML-Parser-ul.43.mcz

commits at source.squeak.org commits at source.squeak.org
Sun Mar 19 20:54:54 UTC 2017


Levente Uzonyi uploaded a new version of XML-Parser to project The Trunk:
http://source.squeak.org/trunk/XML-Parser-ul.43.mcz

==================== Summary ====================

Name: XML-Parser-ul.43
Author: ul
Time: 19 March 2017, 9:24:35.17355 pm
UUID: c424f066-2c05-4c84-ab09-71c5099790b9
Ancestors: XML-Parser-dtl.42

Various optimizations to XMLTokenizer and SAXDriver.
The most notable change is that the attributes of xml nodes will be set to nil instead of a larger empty Dictionary when they have no attributes. But since its accessor will initialize it to an empty dictionary anyway, we can safely save initializing it while parsing the document.

=============== Diff against XML-Parser-dtl.42 ===============

Item was changed:
  ----- Method: SAXDriver>>handleStartTag:attributes:namespaces: (in category 'handling tokens') -----
  handleStartTag: elementName attributes: attributeList namespaces: namespaces
  
  	| localName namespace namespaceURI |
+ 	attributeList ifNotNil: [
+ 		(attributeList includesKey: 'xml:lang') ifTrue: [
+ 			languageEnvironment := LanguageEnvironment localeID: (LocaleID isoString: (attributeList at: 'xml:lang')) ] ].
  
+ 	self usesNamespaces ifFalse: [
+ 		"call the handler"
+ 		self saxHandler
+ 			checkEOD; 
+ 			startElement: elementName namespaceURI: nil namespace: nil attributeList: attributeList.
+ 		^self ].
+ 	
+ 	self scope enterScope.
+ 	"declare any namespaces"
+ 	namespaces ifNotNil: [
+ 		namespaces keysAndValuesDo: [:ns :uri |
+ 			self scope declareNamespace: ns uri: uri] ].
- 	(attributeList includesKey: 'xml:lang')
- 		ifTrue: [languageEnvironment := LanguageEnvironment localeID: (LocaleID isoString: (attributeList at: 'xml:lang'))].
- 	self usesNamespaces
- 		ifTrue: [
- 			self scope enterScope.
- 				"declare any namespaces"	
- 				namespaces keysAndValuesDo: [:ns :uri |
- 					self scope declareNamespace: ns uri: uri].
  
+ 	self splitName: elementName into: [:ns :ln |
+ 		namespace := ns.
+ 		localName := ln].
- 			self splitName: elementName into: [:ns :ln |
- 				namespace := ns.
- 				localName := ln].
  
+ 	namespaceURI := self scope 
+ 		namespaceURIOf: namespace
+ 		ifAbsent: [self parseError: 'Start tag ' , elementName , ' refers to undefined namespace ' , namespace asString].
- 			namespaceURI := self scope 
- 				namespaceURIOf: namespace
- 				ifAbsent: [self parseError: 'Start tag ' , elementName , ' refers to undefined namespace ' , namespace asString].
  
+ 	attributeList ifNotNil: [
+ 		self validatesAttributes ifTrue: [
+ 			self scope validateAttributes: attributeList ] ].
+ 	"call the handler"
+ 	self saxHandler
+ 		checkEOD; 
+ 		startElement: localName namespaceURI: namespaceURI namespace: namespace attributeList: attributeList!
- 			self validatesAttributes
- 				ifTrue: [self scope validateAttributes: attributeList].
- 			"call the handler"
- 			self saxHandler
- 				checkEOD; 
- 				startElement: localName namespaceURI: namespaceURI namespace: namespace attributeList: attributeList]
- 		ifFalse: [
- 			"call the handler"
- 			self saxHandler
- 				checkEOD; 
- 				startElement: elementName namespaceURI: nil namespace: nil attributeList: attributeList]!

Item was changed:
  Object subclass: #XMLTokenizer
+ 	instanceVariableNames: 'stream nestedStreams entities externalEntities parameterEntities parsingMarkup markedPosition peekChar validating stringBuffer stringBufferStack'
+ 	classVariableNames: 'CharEscapes DigitTable LiteralChars NameDelimiters'
- 	instanceVariableNames: 'stream nestedStreams entities externalEntities parameterEntities parsingMarkup markedPosition peekChar validating nameBuffer attributeBuffer'
- 	classVariableNames: 'CharEscapes DigitTable LiteralChars NameDelimiters SeparatorTable'
  	poolDictionaries: ''
  	category: 'XML-Parser'!
  
  !XMLTokenizer commentStamp: '<historical>' prior: 0!
  XMLTokenizer
  
  bolot at cc.gatech.edu
  
  breaks the stream of characters into a stream of XMLnodes (aka token stream)
  token stream is used by XMLparser to generate XMLdocument tree!

Item was changed:
  ----- Method: XMLTokenizer class>>initialize (in category 'class initialization') -----
  initialize
  	"XMLTokenizer initialize"
  
  	CharEscapes := CharacterSet newFrom: #( $& $" $' $> $< ).
  
- 	SeparatorTable  := CharacterSet new.
- 	#(9 10 12 13 32) do: [:each | SeparatorTable add: each asCharacter].
- 
  	LiteralChars := CharacterSet newFrom: #( $: $- $_ $= $.).
  	0 to: 255 do: [:i | 
  		| char |
  		char := i asCharacter.
  		(char isDigit or: [char isLetter])
  		ifTrue: [LiteralChars add: char]].
  
  	NameDelimiters := CharacterSet new.
  	#(9 10 12 13 32 61 "$= asInteger 61" 62 "$> asInteger" 47 "$/ asInteger")
  		do: [:each | NameDelimiters add: each asCharacter].
  
  	DigitTable := Array new: 256.
  	DigitTable atAllPut: -1.
  	($0 to: $9) do: [:each | DigitTable at: each asciiValue put: each digitValue].
  	($a to: $f) do: [:each | DigitTable at: each asciiValue put: each digitValue].
  	($A to: $F) do: [:each | DigitTable at: each asciiValue put: each digitValue].
  !

Item was changed:
  ----- Method: XMLTokenizer>>atEnd (in category 'streaming') -----
  atEnd
+ 
+ 	nestedStreams ifNil: [
+ 		peekChar ifNotNil: [ ^false ].
+ 		^stream atEnd ].
+ 	stream atEnd ifFalse: [ ^false ].
+ 	^self 
+ 		popNestingLevel;
+ 		atEnd!
- 	nestedStreams == nil
- 		ifTrue: [^peekChar == nil and: [stream atEnd]].
- 	^stream atEnd
- 		ifTrue: [
- 			self popNestingLevel.
- 			self atEnd]
- 		ifFalse: [false]!

Item was removed:
- ----- Method: XMLTokenizer>>fastStreamStringContents: (in category 'private') -----
- fastStreamStringContents: writeStream
- 	| newSize |
- 	newSize := writeStream position.
- 	^(String new: newSize)
- 		replaceFrom: 1
- 		to: newSize
- 		with: writeStream originalContents
- 		startingAt: 1!

Item was changed:
  ----- Method: XMLTokenizer>>initialize (in category 'initialize') -----
  initialize
  	parsingMarkup := false.
  	validating := false.
+ 	stringBuffer := (String new: 128) writeStream.
+ 	stringBufferStack := OrderedCollection new!
- 	attributeBuffer := WriteStream on: (String new: 128).
- 	nameBuffer := WriteStream on: (String new: 128)!

Item was changed:
  ----- Method: XMLTokenizer>>next (in category 'streaming') -----
  next
  	"Return the next character from the current input stream. If the current stream is at end pop to next nesting level if there is one.
  	Due to the potential nesting of original document, included documents and replacment texts the streams are held in a stack representing the nested streams. The current stream is the top one."
+ 
  	| nextChar |
+ 	peekChar ifNotNil: [
+ 		nextChar := peekChar.
+ 		peekChar := nil.
+ 		^nextChar ].
+ 	nestedStreams ifNotNil: [ self checkNestedStream ].
+ 	^stream next!
- 	peekChar
- 		ifNil: [
- 			nestedStreams ifNotNil: [self checkNestedStream].
- 			^nextChar := stream next]
- 		ifNotNil: [
- 			nextChar := peekChar.
- 			peekChar := nil.
- 			^nextChar].
- 	!

Item was changed:
  ----- Method: XMLTokenizer>>nextAttributeValue (in category 'tokenizing') -----
  nextAttributeValue
+ 
+ 	| delimiterChar nextChar nextPeek referenceString entity entityValue |
- 	| delimiterChar attributeValueStream nextChar nextPeek referenceString entity entityValue |
  	delimiterChar := self next.
  	(delimiterChar == $"
  		or: [delimiterChar == $'])
  		ifFalse: [self errorExpected: 'Attribute value delimiter expected.'].
+ 	self pushNewStringBuffer.
- 	attributeValueStream := attributeBuffer reset.
  	[
  	nextPeek := nextChar := self next.
  	nextChar ifNil: [self errorExpected: 'Character expected.'].
  	nextChar == $&
  		ifTrue: [
  			self peek == $#
  				ifTrue: [
  					nextPeek := nil.
  					nextChar := self nextCharReference]
  				ifFalse: [
  					referenceString := self nextLiteral.
  					self next == $;
  						ifFalse: [self errorExpected: ';'].
  					entity := self entity: referenceString.
  					entityValue := entity valueForContext: #content.
  					(self class isCharEscape: entityValue)
  						ifTrue: [
  							nextPeek := nil.
  							nextChar := entityValue first]
  						ifFalse: [
  							entityValue := entityValue asString.
  							entityValue isEmpty
  								ifTrue: [nextPeek := nextChar := nil]
  								ifFalse: [
  									self pushStream: (ReadStream on: entityValue asString).
  									nextPeek := nextChar := self next]]]].
  	nextPeek == delimiterChar]
  		whileFalse: [
+ 			nextChar ifNotNil: [stringBuffer nextPut: nextChar]].
+ 	^self popStringBuffer!
- 			nextChar ifNotNil: [attributeValueStream nextPut: nextChar]].
- 	^self fastStreamStringContents: attributeValueStream
- "	^attributeValueStream contents"!

Item was changed:
  ----- Method: XMLTokenizer>>nextEntity (in category 'tokenizing') -----
  nextEntity
  	"return the next XMLnode, or nil if there are no more.
  	Fixed to retain leading whitespace when PCDATA is detected."
  
  	|whitespace|
  	"branch, depending on what the first character is"
  	whitespace := self nextWhitespace.
  	self atEnd ifTrue: [self handleEndDocument. ^ nil].
  	self checkAndExpandReference: (self parsingMarkup ifTrue: [#dtd] ifFalse: [#content]).
+ 	self peek == $< ifTrue: [ ^self nextNode ].
+ 	whitespace isEmpty ifFalse: [ self pushBack: whitespace ].
+ 	^self nextPCData!
- 	^self peek = $<
- 		ifTrue: [self nextNode]
- 		ifFalse: [whitespace isEmpty
- 					ifFalse: [self pushBack: whitespace].
- 				self nextPCData]!

Item was changed:
  ----- Method: XMLTokenizer>>nextName (in category 'tokenizing') -----
  nextName
+ 
+ 	self pushNewStringBuffer.
+ 	self peek == $. ifTrue: [ self malformedError: 'Character expected.' ].
+ 	[
+ 		self peek ifNil: [ self errorExpected: 'Character expected.' ].
+ 		NameDelimiters includes: peekChar ] whileFalse: [
+ 			stringBuffer nextPut: self next ].
+ 	^self popStringBuffer!
- 	| nextChar |
- 	nameBuffer reset.
- 	self peek == $.
- 		ifTrue: [self malformedError: 'Character expected.'].
- 	[(nextChar := self peek)
- 		ifNil: [self errorExpected: 'Character expected.'].
- 	NameDelimiters includes: nextChar] whileFalse: [
- 			nameBuffer nextPut: self next].
- 	^self fastStreamStringContents: nameBuffer!

Item was changed:
  ----- Method: XMLTokenizer>>nextNode (in category 'tokenizing') -----
  nextNode
  	| nextChar |
  	"Skip < "
  	self next.
+ 	self peek == $!! ifTrue: [
- 	nextChar := self peek.
- 	nextChar == $!! ifTrue: [
  		"Skip !!"
  		self next.
  		nextChar := self peek.
  		nextChar == $- ifTrue: [^self nextComment].
  		nextChar == $[ ifTrue: [^self nextCDataOrConditional].
+ 		self parsingMarkup 	ifTrue: [ ^self nextMarkupDeclaration ].
+ 		^self nextDocType ].
+ 	peekChar == $? ifTrue: [^self nextPI].
- 		^self parsingMarkup
- 			ifTrue: [self nextMarkupDeclaration]
- 			ifFalse: [self nextDocType]].
- 	nextChar == $? ifTrue: [^self nextPI].
  	^self nextTag!

Item was changed:
  ----- Method: XMLTokenizer>>nextPCData (in category 'tokenizing') -----
  nextPCData
- 	| resultStream nextChar referenceString entity entityValue nextPeek |
- 	resultStream := (String new: 10) writeStream.
- 	self validating
- 		ifFalse: [
- 			[self peek == $<]
- 				whileFalse: [resultStream nextPut: self next].
- 			^self handlePCData: resultStream contents].
  
+ 	| nextChar referenceString entity entityValue nextPeek |
+ 	self validating ifFalse: [
+ 		self peek == $< ifTrue: [ ^self handlePCData: '' ].
+ 		self pushNewStringBuffer.
+ 		[ self peek == $< ]
+ 			whileFalse: [ stringBuffer nextPut: self next ].
+ 		^self handlePCData: self popStringBuffer ].
+ 
+ 	self pushNewStringBuffer.
  	[
  	nextPeek := nextChar := self peek.
  	nextChar ifNil: [self errorExpected: 'Character expected.'].
  	nextChar == $&
  		ifTrue: [
  			self next.
  			self peek == $#
  				ifTrue: [
  					nextPeek := nil.
  					nextChar := self nextCharReference]
  				ifFalse: [
  					referenceString := self nextLiteral.
  					self next == $;
  						ifFalse: [self errorExpected: ';'].
  					entity := self entity: referenceString.
  					entityValue := entity valueForContext: #content.
  					(self class isCharEscape: entityValue)
  						ifTrue: [
  							nextPeek := nil.
  							nextChar := entityValue first]
  						ifFalse: [
  							entityValue := entityValue asString.
  							entityValue isEmpty
  								ifTrue: [nextPeek := nextChar := nil]
  								ifFalse: [
+ 									self pushStream: entityValue readStream.
- 									self pushStream: (ReadStream on: entityValue asString).
  									nextPeek := nextChar := self peek]]]]
  		ifFalse: [nextPeek == $< ifFalse: [self next]].
  	nextPeek == $<]
  		whileFalse: [
+ 			nextChar ifNotNil: [stringBuffer nextPut: nextChar]].
+ 	self handlePCData: self popStringBuffer!
- 			nextChar ifNotNil: [resultStream nextPut: nextChar]].
- 	self handlePCData: resultStream contents!

Item was changed:
  ----- Method: XMLTokenizer>>nextTag (in category 'tokenizing') -----
  nextTag
+ 	
+ 	| tagName attributes namespaces |
+ 	self peek == $/ ifTrue: [^self nextEndTag].
- 	| tagName attributes nextChar namespaces |
- 	(self peek = $/)
- 		ifTrue: [^self nextEndTag].
  	tagName := self nextName.
  	self skipSeparators.
+ 	attributes := nil.
+ 	namespaces := nil.
+ 	[ self peek == $> or: [ peekChar == $/ ] ] whileFalse: [
+ 		self 
+ 			checkAndExpandReference: #content;
+ 			nextAttributeInto: (attributes ifNil: [ attributes := Dictionary new ])
+ 				namespaces: (namespaces ifNil: [ namespaces := Dictionary new ]);
+ 			skipSeparators ].
- 	attributes := Dictionary new: 33.
- 	namespaces := Dictionary new: 5.
- 	[(nextChar := self peek) == $> or: [nextChar == $/]] whileFalse: [
- 		self checkAndExpandReference: #content.
- 		self nextAttributeInto: attributes namespaces: namespaces.
- 		self skipSeparators.].
  	self handleStartTag: tagName attributes: attributes namespaces: namespaces.
+ 	self next == $/ ifTrue: [
+ 		self
+ 			handleEndTag: tagName;
+ 			next ]!
- 	self next == $/
- 		ifTrue: [
- 			self handleEndTag: tagName.
- 			self next].
- 	!

Item was changed:
  ----- Method: XMLTokenizer>>nextTrimmedBlanksUpTo: (in category 'streaming') -----
  nextTrimmedBlanksUpTo: delimiter
+ 
+ 	| nextChar |
+ 	self pushNewStringBuffer.
- 	| resultStream nextChar |
- 	resultStream := WriteStream on: (String new: 10).
- 	nextChar := nil.
  	[(nextChar := self next) == delimiter]
  		whileFalse: [
  			nextChar == $  ifFalse: [
+ 				stringBuffer nextPut: nextChar]].
- 				resultStream nextPut: nextChar]].
  	nextChar == delimiter
  		ifFalse: [self parseError: 'XML no delimiting ' , delimiter printString , ' found'].
+ 	^self popStringBuffer
- 	^resultStream contents
  !

Item was changed:
  ----- Method: XMLTokenizer>>nextWhitespace (in category 'tokenizing') -----
  nextWhitespace
+ 
+ 	| resultString |
+ 	"Optimize the most common case away."
+ 	self peek ifNil: [ ^'' ].
+ 	peekChar isSeparator ifFalse: [ ^'' ].	
+ 		
+ 	self pushNewStringBuffer.
+ 	[ self peek
+ 		ifNil: [ false ]
+ 		ifNotNil: [ peekChar isSeparator ] ]
+ 		whileTrue: [ stringBuffer nextPut: self next ].
- 	| nextChar resultStream resultString|
- 	resultStream := (String new: 10) writeStream.
- 	[((nextChar := self peek) ~~ nil)
- 		and: [SeparatorTable includes: nextChar]]
- 		whileTrue: [resultStream nextPut: nextChar. self next].
  	(nestedStreams == nil or: [self atEnd not])
  		ifFalse: [self checkNestedStream.
  				self nextWhitespace].
+ 	resultString := self popStringBuffer.
- 	resultString := resultStream contents.
  	resultString isEmpty ifFalse: [self handleWhitespace: resultString].
  	^resultString!

Item was changed:
  ----- Method: XMLTokenizer>>peek (in category 'streaming') -----
  peek
  	"Return the next character from the current input stream. If the current stream poop to next nesting level if there is one.
  	Due to the potential nesting of original document, included documents and replacment texts the streams are held in a stack representing the nested streams. The current stream is the top one."
+ 	
+ 	^peekChar 	ifNil: [
+ 		nestedStreams ifNotNil: [ self checkNestedStream ].
+ 		peekChar := stream next ]!
- 	peekChar
- 		ifNil: [
- 			nestedStreams ifNotNil: [self checkNestedStream].
- 			^peekChar := stream next]
- 		ifNotNil: [^peekChar]!

Item was added:
+ ----- Method: XMLTokenizer>>popStringBuffer (in category 'private - buffering') -----
+ popStringBuffer
+ 
+ 	| previousPosition result |
+ 	previousPosition := stringBufferStack removeLast.
+ 	result := stringBuffer originalContents
+ 		copyFrom: previousPosition + 1
+ 		to: stringBuffer position.
+ 	stringBuffer position: previousPosition.
+ 	^result!

Item was added:
+ ----- Method: XMLTokenizer>>pushNewStringBuffer (in category 'private - buffering') -----
+ pushNewStringBuffer
+ 
+ 	stringBufferStack addLast: stringBuffer position
+ 	!

Item was changed:
  ----- Method: XMLTokenizer>>skipSeparators (in category 'streaming') -----
  skipSeparators
+ 
+ 	[ self peek
+ 		ifNil: [ false ]
+ 		ifNotNil: [ peekChar isSeparator ] ]
+ 		whileTrue: [ self next ].
+ 	nestedStreams ifNil: [ ^self ].
+ 	self atEnd ifTrue: [
+ 		self
+ 			checkNestedStream;
+ 			skipSeparators ]!
- 	| nextChar |
- 	[((nextChar := self peek) ~~ nil)
- 		and: [SeparatorTable includes: nextChar]]
- 		whileTrue: [self next].
- 	(nestedStreams == nil or: [self atEnd not])
- 		ifFalse: [
- 			self checkNestedStream.
- 			self skipSeparators]!



More information about the Squeak-dev mailing list