[squeak-dev] The Trunk: Collections-pre.762.mcz

commits at source.squeak.org commits at source.squeak.org
Tue Aug 29 14:50:19 UTC 2017


Patrick Rein uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-pre.762.mcz

==================== Summary ====================

Name: Collections-pre.762
Author: pre
Time: 29 August 2017, 4:50:11.458834 pm
UUID: d7838b91-7ce4-c34c-ac5a-c46cee281140
Ancestors: Collections-bf.761

Changes the HTMLReadWriter to deal correctly with nested tags and their mapping to text attributes. Also adds a comment to the class.

=============== Diff against Collections-bf.761 ===============

Item was changed:
  TextReadWriter subclass: #HtmlReadWriter
  	instanceVariableNames: 'count offset runStack runArray string breakLines'
  	classVariableNames: ''
  	poolDictionaries: ''
  	category: 'Collections-Text'!
+ 
+ !HtmlReadWriter commentStamp: 'pre 8/29/2017 16:14' prior: 0!
+ A HtmlReadWriter is used to read a Text object from a string containing HTML or writing a Text object to a string with HTML tags representing the text attributes.
+ 
+ It does two things currently:
+ 1) Setting text attributes on the beginning of tags, e.g. setting a bold text attribute when seeing a <b> tag.
+ 2) Changing the resulting string, e.g. replacing a <br> with a Character cr.
+ 
+ The implementation works by pushing attributes on a stack on every opening tag. On the corresponding closing tag, the attribute is poped from the stack and stored in an array of attribute runs. From this array the final string is constructed.
+ 
+ ## Notes on the implementation
+ - The final run array is completely constructed while parsing so it has to be correct with regard to the length of the runs. There is no consolidation except for merging neighboring runs which include the same attributes.
+ - The *count* variable is the position in the source string, the *offset* is the number of skipped characters, for example ones that denote a tag.
+ - The stack contains elements which are of the form: {text attributes. current start index. original start}!

Item was added:
+ ----- Method: HtmlReadWriter>>addCharacter: (in category 'private') -----
+ addCharacter: aCharacter
+ 
+ 	string add: aCharacter.
+ 	count := count + 1.!

Item was added:
+ ----- Method: HtmlReadWriter>>addString: (in category 'private') -----
+ addString: aString
+ 
+ 	string addAll: aString.
+ 	count := count + aString size.!

Item was changed:
  ----- Method: HtmlReadWriter>>isTagIgnored: (in category 'testing') -----
  isTagIgnored: aTag
  
  	| space t |
+ 	t := aTag copyWithoutAll: '</>'.
+ 	space := t indexOf: Character space.
- 	space := aTag indexOf: Character space.
  	t := space > 0
+ 		ifTrue: [t copyFrom: 1 to: space - 1]
+ 		ifFalse: [t].
- 		ifTrue: [aTag copyFrom: 2 to: space - 1]
- 		ifFalse: [aTag copyFrom: 2 to: aTag size - 1].
  	^ self ignoredTags includes: t!

Item was changed:
  ----- Method: HtmlReadWriter>>mapCloseCodeTag (in category 'mapping') -----
  mapCloseCodeTag
  
  	| theDoIt |
  	theDoIt := runStack top first
  		detect: [:attribute | attribute isKindOf: TextDoIt]
  		ifNone: [^ self "nothing found, ignore"].
+ 	theDoIt evalString: (String withAll: (string copyFrom: runStack top third to: string size)).!
- 	theDoIt evalString: (String withAll: (string copyFrom: runStack top second to: string size)).!

Item was changed:
+ ----- Method: HtmlReadWriter>>nextPutText: (in category 'private') -----
- ----- Method: HtmlReadWriter>>nextPutText: (in category 'accessing') -----
  nextPutText: aText
  
  	| previous |
  	previous := #().
  	self activateAttributesEnding: #() starting: previous. "for consistency"
  	aText runs
  		withStartStopAndValueDo: [:start :stop :attributes | 
  			self
  				deactivateAttributesEnding: previous starting: attributes;
  				activateAttributesEnding: previous starting: attributes;
  				writeContent: (aText string copyFrom: start to: stop).
  			previous := attributes].
  	self deactivateAttributesEnding: previous starting: #().!

Item was changed:
+ ----- Method: HtmlReadWriter>>nextText (in category 'private') -----
- ----- Method: HtmlReadWriter>>nextText (in category 'accessing') -----
  nextText
  
  	count := 0.
  	offset := 0. "To ignore characters in the input string that are used by tags."
  	
  	runStack := Stack new.
  	
  	runArray := RunArray new.
  	string := OrderedCollection new.
  	
+ 	"{text attributes. current start index. original start}"
+ 	runStack push: {OrderedCollection new. 1. 1}.
- 	"{text attributes. start index. end index. number of open tags}"
- 	runStack push: {OrderedCollection new. 1. nil. 0}.
  
  	[stream atEnd] whileFalse: [self processNextTag].
  	self processRunStackTop. "Add last run."
  
  	string := String withAll: string.
  	runArray coalesce.
  	
  	^ Text
  		string: string
  		runs: runArray!

Item was changed:
  ----- Method: HtmlReadWriter>>processEmptyTag: (in category 'reading') -----
  processEmptyTag: aTag
  
  	(aTag beginsWith: '<br') ifTrue: [
+ 		self addCharacter: Character cr.
- 		string add: Character cr.
- 		count := count + 1.
  		^ self].
  	
+ 	(self isTagIgnored: aTag)
- 	(self ignoredTags includes: (aTag copyFrom: 2 to: aTag size - 3))
  		ifTrue: [^ self].
  		
+ 	"TODO... what?"!
- 	"TODO..."!

Item was changed:
  ----- Method: HtmlReadWriter>>processEndTag: (in category 'reading') -----
  processEndTag: aTag
  
  	| index tagName |
  	index := count - offset.
  	tagName := aTag copyFrom: 3 to: aTag size - 1.
  
+ 	(self isTagIgnored: tagName) ifTrue: [^ self].
+ 	
- 	(self ignoredTags includes: tagName) ifTrue: [^ self].
  	tagName = 'code' ifTrue: [self mapCloseCodeTag].
  	tagName = 'pre' ifTrue: [self breakLines: true].
- 
- 	"De-Accumulate adjacent tags."
- 	runStack top at: 4 put: runStack top fourth - 1.
- 	runStack top fourth > 0
- 		ifTrue: [^ self "not yet"].
  		
  	self processRunStackTop.
  
  	runStack pop.
  	runStack top at: 2 put: index + 1.!

Item was changed:
  ----- Method: HtmlReadWriter>>processHtmlEscape: (in category 'reading') -----
  processHtmlEscape: aString
  	| escapeSequence |
  	escapeSequence := aString copyFrom: 2 to: aString size - 1.
  	escapeSequence first = $# ifTrue: [^ self processHtmlEscapeNumber: escapeSequence allButFirst].
  	(String htmlEntities at: (aString copyFrom: 2 to: aString size - 1) ifAbsent: [])
  		ifNotNil: [:char | 
+ 			self addCharacter: char].!
- 			string add: char.
- 			count := count + 1].!

Item was changed:
+ ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category 'private') -----
- ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category 'reading') -----
  processHtmlEscapeNumber: aString
  	| number |
  	number := aString first = $x
  		ifTrue: [ '16r', aString allButFirst ]
  		ifFalse: [ aString ].
+ 	self addCharacter: number asNumber asCharacter.
+ 	!
- 	string add: number asNumber asCharacter!

Item was changed:
  ----- Method: HtmlReadWriter>>processNextTag (in category 'reading') -----
  processNextTag
  
  	| tag htmlEscape lookForNewTag lookForHtmlEscape tagFound valid inComment inTagString |
  	lookForNewTag := true.
  	lookForHtmlEscape := false.
  	tagFound := false.
  	tag := OrderedCollection new.
  	htmlEscape := OrderedCollection new.
  	inComment := false.
  	inTagString := false.
  	
  	[stream atEnd not and: [tagFound not]] whileTrue: [
  		| character |
  		character := stream next.
  		valid := (#(10 13) includes: character asciiValue) not.
  		count := count + 1.
  	
  		character = $< ifTrue: [lookForNewTag := false].
+ 		character = $& ifTrue: [inComment ifFalse: [lookForHtmlEscape := true]].
- 		character = $& ifTrue: [
- 			inComment ifFalse: [lookForHtmlEscape := true]].
  		
  		lookForNewTag
  			ifTrue: [
  				lookForHtmlEscape
  					ifFalse: [
  						(valid or: [self breakLines not])
  							ifTrue: [string add: character]
  							ifFalse: [offset := offset + 1]]
  					ifTrue: [valid ifTrue: [htmlEscape add: character]. offset := offset + 1]]
  			ifFalse: [valid ifTrue: [tag add: character]. offset := offset + 1].
  
  		"Toggle within tag string/text."
  		(character = $" and: [lookForNewTag not])
  			ifTrue: [inTagString := inTagString not].
  		
  		inComment := ((lookForNewTag not and: [tag size >= 4])
  			and: [tag beginsWith: '<!!--'])
  			and: [(tag endsWith: '-->') not].
  
  		(((character = $> and: [inComment not]) and: [lookForNewTag not]) and: [inTagString not]) ifTrue: [
  			lookForNewTag := true.
  			(tag beginsWith: '<!!--')
  				ifTrue: [self processComment: (String withAll: tag)]
  				ifFalse: [tag second ~= $/
  					ifTrue: [
  						(tag atLast: 2) == $/
  							ifTrue: [self processEmptyTag: (String withAll: tag)]
  							ifFalse: [self processStartTag: (String withAll: tag)]]
  					ifFalse: [self processEndTag: (String withAll: tag)]].			
  			tagFound := true].
  
  		(((character = $; and: [lookForNewTag])
  			and: [htmlEscape notEmpty]) and: [htmlEscape first = $&]) ifTrue: [
  				lookForHtmlEscape := false.
  				self processHtmlEscape: (String withAll: htmlEscape).
  				htmlEscape := OrderedCollection new]].
  !

Item was changed:
  ----- Method: HtmlReadWriter>>processRunStackTop (in category 'reading') -----
  processRunStackTop
  	"Write accumulated attributes to run array."
  	
+ 	| currentIndex start attrs |
+ 	currentIndex := count - offset.
- 	| index start end attrs |
- 	index := count - offset.
- 	
- 	"Set end index."
- 	runStack top at: 3 put: index.
- 	"Write to run array."
  	start := runStack top second.
- 	end := runStack top third.
  	attrs := runStack top first.
  	runArray
  		addLast: attrs asArray
+ 		times: currentIndex - start + 1.!
- 		times: end - start + 1.!

Item was changed:
  ----- Method: HtmlReadWriter>>processStartTag: (in category 'reading') -----
  processStartTag: aTag
  
  	| index |
  	(self isTagIgnored: aTag) ifTrue: [^ self].
  
  	index := count - offset.
  
  	aTag = '<br>' ifTrue: [
+ 		self addCharacter: Character cr.
- 		string add: Character cr.
- 		count := count + 1.
  		^ self].
  	(aTag beginsWith: '<img') ifTrue: [
+ 		self addString: '[image]'.
- 		string addAll: '[image]'.
- 		count := count + 7.
  		^ self].
  	
+ 	self processRunStackTop. "To add all attributes before the next tag adds some."
- 	"Accumulate adjacent tags."
- 	(runStack size > 1 and: [runStack top second = (index + 1) "= adjacent start tags"])
- 		ifTrue: [
- 			runStack top at: 1 put: (runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself).
- 			runStack top at: 4 put: (runStack top fourth + 1). "increase number of open tags"
- 			^self].
- 	
- 	self processRunStackTop.
  
- 	"Remove start/end info to reuse attributes later."
- 	runStack top at: 2 put: nil.
- 	runStack top at: 3 put: nil.
  	"Copy attr list and add new attr."
+ 	runStack push: ({runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself. index + 1 . index + 1}).
+ 	!
- 	runStack push: ({runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself. index + 1. nil. 1}).!



More information about the Squeak-dev mailing list