Christoph Thiede uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ct.1038.mcz
==================== Summary ====================
Name: Collections-ct.1038
Author: ct
Time: 20 May 2023, 6:14:31.154984 pm
UUID: 94300876-0004-ba4a-8787-ba1d7419c776
Ancestors: Collections-ct.1037
Makes HtmlReadWriter capable of rejecting downloadable or executable resources during parsing, such as URLs or code:// URIs in <img> tags. By default, a HtmlReadWriter has full permissions. May be useful for performance or security concerns. Squeak Inbox Talk needs this. :-)
=============== Diff against Collections-ct.1037 ===============
Item was changed:
TextReadWriter subclass: #HtmlReadWriter
+ instanceVariableNames: 'count offset runStack runArray string breakLines permissions indent preformattingLevel exclusionLevel lastFixedWhitespace'
- instanceVariableNames: 'count offset runStack runArray string breakLines indent preformattingLevel exclusionLevel lastFixedWhitespace'
classVariableNames: ''
poolDictionaries: ''
category: 'Collections-Text'!
!HtmlReadWriter commentStamp: 'pre 8/29/2017 16:14' prior: 0!
A HtmlReadWriter is used to read a Text object from a string containing HTML or writing a Text object to a string with HTML tags representing the text attributes.
It does two things currently:
1) Setting text attributes on the beginning of tags, e.g. setting a bold text attribute when seeing a <b> tag.
2) Changing the resulting string, e.g. replacing a <br> with a Character cr.
The implementation works by pushing attributes on a stack on every opening tag. On the corresponding closing tag, the attribute is poped from the stack and stored in an array of attribute runs. From this array the final string is constructed.
## Notes on the implementation
- The final run array is completely constructed while parsing so it has to be correct with regard to the length of the runs. There is no consolidation except for merging neighboring runs which include the same attributes.
- The *count* variable is the position in the source string, the *offset* is the number of skipped characters, for example ones that denote a tag.
- The stack contains elements which are of the form: {text attributes. current start index. original start}!
Item was added:
+ ----- Method: HtmlReadWriter>>defaultPermissions (in category 'initialize-release') -----
+ defaultPermissions
+
+ ^ Set with: #downloadResources with: #evaluateResources!
Item was changed:
----- Method: HtmlReadWriter>>getImage: (in category 'private') -----
getImage: uri
(uri beginsWith: 'cid:' caseSensitive: false) ifTrue: [
"Content-ID, used in nested MIMEDocuments, e.g., for emails with inline images"
(Smalltalk classNamed: #MIMEContentRequest) ifNotNil: [:class |
| cid |
cid := uri allButFirst: 4.
(class signal: cid) ifNotNil: [:document |
^ [document image] ifError: [nil]]].
^ nil].
(uri beginsWith: 'code://' caseSensitive: false) ifTrue: [
| expression |
"Same support for Smalltalk expressions as in TextURL >> #actOnClickFor:."
+ self shallEvaluateResources ifFalse: [^ nil].
expression := uri allButFirst: 7.
^ ([Compiler evaluate: expression] ifError: [nil])
ifNotNil: [:object | object isForm ifTrue: [object] ifFalse: [nil]]].
(uri beginsWith: 'data:' caseSensitive: false) ifTrue: [ | data mediaType separator |
separator := uri indexOf: $, ifAbsent: [^ nil].
mediaType := uri copyFrom: 6 to: separator - 1.
data := uri allButFirst: separator.
data := (mediaType endsWith: ';base64' caseSensitive: false)
ifTrue: [
mediaType := mediaType allButLast: 7.
[Base64MimeConverter mimeDecodeToBytes: data readStream] ifError: [nil]]
ifFalse: [data asByteArray readStream].
^ [ImageReadWriter formFromStream: data] ifError: [nil]].
+ self shallDownloadResources ifFalse: [^ nil].
^ (Smalltalk classNamed: #WebClient) ifNotNil: [:client |
"Maybe we can have this via an AppRegistry at some point. Maybe extend WebBrowser."
([client httpGet: uri] ifError: [nil]) ifNotNil: [:response |
response code = 200 ifFalse: [nil] ifTrue: [
[Form fromBinaryStream: response content asByteArray readStream]
ifError: [nil]]]]!
Item was added:
+ ----- Method: HtmlReadWriter>>hasPermission: (in category 'accessing - private') -----
+ hasPermission: aSymbol
+
+ ^ permissions includes: aSymbol!
Item was changed:
----- Method: HtmlReadWriter>>initialize (in category 'initialize-release') -----
initialize
super initialize.
preformattingLevel := 0.
exclusionLevel := 0.
lastFixedWhitespace := 0.
+ permissions := self defaultPermissions.
self indent: 0.
self breakLines: true.!
Item was added:
+ ----- Method: HtmlReadWriter>>setPermission:to: (in category 'accessing - private') -----
+ setPermission: aSymbol to: aBoolean
+
+ aBoolean
+ ifTrue: [permissions add: aSymbol]
+ ifFalse: [permissions remove: aSymbol ifAbsent: []].!
Item was added:
+ ----- Method: HtmlReadWriter>>shallDownloadResources (in category 'accessing') -----
+ shallDownloadResources
+ "Indicate whether the receiver shall download resources during parsing, such as an image URL in "'<img src="https://squeak.org/static/img/squeak.png">'." May be useful for performance or security concerns."
+
+ ^ self hasPermission: #downloadResources!
Item was added:
+ ----- Method: HtmlReadWriter>>shallDownloadResources: (in category 'accessing') -----
+ shallDownloadResources: aBoolean
+ "Set whether the receiver shall download resources during parsing, such as an image URL in "'<img src="https://squeak.org/static/img/squeak.png">'." May be useful for performance or security concerns."
+
+ ^ self setPermission: #downloadResources to: aBoolean!
Item was added:
+ ----- Method: HtmlReadWriter>>shallEvaluateResources (in category 'accessing') -----
+ shallEvaluateResources
+ "Indicate whether the receiver shall evaluate resources during parsing, such as an code URI in "'<img src="code://MenuIcons squeakIcon">'." May be useful for performance or security concerns."
+
+ ^ self hasPermission: #downloadResources!
Item was added:
+ ----- Method: HtmlReadWriter>>shallEvaluateResources: (in category 'accessing') -----
+ shallEvaluateResources: aBoolean
+ "Set whether the receiver shall evaluate resources during parsing, such as an code URI in "'<img src="code://MenuIcons squeakIcon">'." May be useful for performance or security concerns."
+
+ ^ self setPermission: #downloadResources to: aBoolean!
Item was changed:
(PackageInfo named: 'Collections') postscript: '"Make sure the symbol table consists of immutable sets"
#(SymbolTable NewSymbols) do: [ :variableName |
(Symbol classPool at: variableName) beReadOnlyObject ].
"Add new instvars to HtmlReadWriter"
HtmlReadWriter allSubInstancesDo: [:ea |
+ (ea instVarNamed: ''permissions'') ifNil: [
+ ea instVarNamed: ''permissions'' put: ea defaultPermissions].
+
(ea instVarNamed: ''indent'') ifNil: [
ea instVarNamed: ''indent'' put: 0].
(ea instVarNamed: ''preformattingLevel'') ifNil: [
ea instVarNamed: ''preformattingLevel'' put:
(ea breakLines ifTrue: [0] ifFalse: [1])].
(ea instVarNamed: ''exclusionLevel'') ifNil: [
ea instVarNamed: ''exclusionLevel'' put: 0].
(ea instVarNamed: ''lastFixedWhitespace'') ifNil: [
ea instVarNamed: ''lastFixedWhitespace'' put: 0]].'!
Christoph Thiede uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ct.1037.mcz
==================== Summary ====================
Name: Collections-ct.1037
Author: ct
Time: 20 May 2023, 5:55:52.664861 pm
UUID: 46ac344d-db85-cc46-892f-a4829536c23e
Ancestors: Collections-ct.1036
Adds <blockquote> tag to HtmlReadWriter and maps it to TextIndent. Also implements #= on TextIndent.
=============== Diff against Collections-ct.1036 ===============
Item was changed:
TextReadWriter subclass: #HtmlReadWriter
+ instanceVariableNames: 'count offset runStack runArray string breakLines indent preformattingLevel exclusionLevel lastFixedWhitespace'
- instanceVariableNames: 'count offset runStack runArray string breakLines preformattingLevel exclusionLevel lastFixedWhitespace'
classVariableNames: ''
poolDictionaries: ''
category: 'Collections-Text'!
!HtmlReadWriter commentStamp: 'pre 8/29/2017 16:14' prior: 0!
A HtmlReadWriter is used to read a Text object from a string containing HTML or writing a Text object to a string with HTML tags representing the text attributes.
It does two things currently:
1) Setting text attributes on the beginning of tags, e.g. setting a bold text attribute when seeing a <b> tag.
2) Changing the resulting string, e.g. replacing a <br> with a Character cr.
The implementation works by pushing attributes on a stack on every opening tag. On the corresponding closing tag, the attribute is poped from the stack and stored in an array of attribute runs. From this array the final string is constructed.
## Notes on the implementation
- The final run array is completely constructed while parsing so it has to be correct with regard to the length of the runs. There is no consolidation except for merging neighboring runs which include the same attributes.
- The *count* variable is the position in the source string, the *offset* is the number of skipped characters, for example ones that denote a tag.
- The stack contains elements which are of the form: {text attributes. current start index. original start}!
Item was added:
+ ----- Method: HtmlReadWriter>>indent (in category 'accessing') -----
+ indent
+
+ ^ indent!
Item was added:
+ ----- Method: HtmlReadWriter>>indent: (in category 'accessing') -----
+ indent: anInteger
+
+ indent := anInteger.!
Item was changed:
----- Method: HtmlReadWriter>>initialize (in category 'initialize-release') -----
initialize
super initialize.
preformattingLevel := 0.
exclusionLevel := 0.
lastFixedWhitespace := 0.
+ self indent: 0.
-
self breakLines: true.!
Item was changed:
----- Method: HtmlReadWriter>>mapAlignmentTag: (in category 'mapping') -----
mapAlignmentTag: aTag
" special html case ".
(self hasTag: aTag name: '<center') ifTrue: [^ {TextAlignment centered}].
+ (self hasTag: aTag name: '<blockquote') ifTrue: [^ self mapBlockquoteTag].
"<div align=justify> or <div align=""right"">"
(self searchTag: aTag forAttribute: #align) ifNotNil: [:alignment |
^ self mapAlignmentValue: alignment asLowercase].
^ #()!
Item was added:
+ ----- Method: HtmlReadWriter>>mapBlockquoteTag (in category 'mapping') -----
+ mapBlockquoteTag
+
+ ^ {TextIndent tabs: (indent := indent + 1)}!
Item was added:
+ ----- Method: HtmlReadWriter>>mapCloseBlockquoteTag (in category 'mapping') -----
+ mapCloseBlockquoteTag
+
+ indent := indent - 1.!
Item was changed:
----- Method: HtmlReadWriter>>processEndTag:eagerly: (in category 'reading') -----
processEndTag: tagName eagerly: eagerly
| index |
(self isTagIgnored: tagName) ifTrue: [^ self].
(self isExcludedTag: tagName) ifTrue: [
exclusionLevel := exclusionLevel - 1.
^ self].
eagerly ifFalse: [
"Void tags such as <img> have no closing tag in HTML5, thus we already simulate their end in #processStartTag:. If we reach here anyway for XHTML style tags, we must not process the end tag again."
(self isVoidTag: tagName) ifTrue: [^ self]].
(tagName = 'p' or: [tagName = 'div'] or: [tagName = 'pre']) ifTrue: [
self assureLinebreak].
tagName = 'hr' ifTrue: [self addCharacter: Character cr].
+ tagName = 'blockquote' ifTrue: [self mapCloseBlockquoteTag].
tagName = 'code' ifTrue: [self mapCloseCodeTag].
tagName = 'pre' ifTrue: [self mapClosePreformattingTag].
self processRunStackTop.
index := count - offset.
runStack pop.
runStack top at: 2 put: index + 1.!
Item was added:
+ ----- Method: TextIndent>>= (in category 'comparing') -----
+ = other
+
+ ^ (other class == self class)
+ and: [other amount = self amount]!
Item was added:
+ ----- Method: TextIndent>>closeHtmlOn: (in category 'html') -----
+ closeHtmlOn: aStream
+
+ aStream nextPutAll: '</blockquote>'.!
Item was added:
+ ----- Method: TextIndent>>openHtmlOn: (in category 'html') -----
+ openHtmlOn: aStream
+
+ aStream nextPutAll: '<blockquote>'.!
Item was changed:
(PackageInfo named: 'Collections') postscript: '"Make sure the symbol table consists of immutable sets"
#(SymbolTable NewSymbols) do: [ :variableName |
(Symbol classPool at: variableName) beReadOnlyObject ].
"Add new instvars to HtmlReadWriter"
HtmlReadWriter allSubInstancesDo: [:ea |
+ (ea instVarNamed: ''indent'') ifNil: [
+ ea instVarNamed: ''indent'' put: 0].
+
(ea instVarNamed: ''preformattingLevel'') ifNil: [
ea instVarNamed: ''preformattingLevel'' put:
(ea breakLines ifTrue: [0] ifFalse: [1])].
(ea instVarNamed: ''exclusionLevel'') ifNil: [
ea instVarNamed: ''exclusionLevel'' put: 0].
(ea instVarNamed: ''lastFixedWhitespace'') ifNil: [
ea instVarNamed: ''lastFixedWhitespace'' put: 0]].'!
Christoph Thiede uploaded a new version of CollectionsTests to project The Trunk:
http://source.squeak.org/trunk/CollectionsTests-ct.357.mcz
==================== Summary ====================
Name: CollectionsTests-ct.357
Author: ct
Time: 6 May 2021, 10:10:49.328835 pm
UUID: 4127e94a-ff43-c34b-937d-0760d63c4226
Ancestors: CollectionsTests-nice.354
Adds test for Collections-ct.945 (empty end tag </img>).
=============== Diff against CollectionsTests-nice.354 ===============
Item was added:
+ ----- Method: HtmlReadWriterTest>>test17EmptyEndImgTag (in category 'tests') -----
+ test17EmptyEndImgTag
+ "Empty end tags are disallowed in XHTML but required in HTML5."
+
+ self convertHtml: 'a<img></img>z'.
+ self assert: ({$a. Character value: 1. $z} as: String) equals: text string.
+ self assert: (RunArray new: 3 withAll: #()) equals: text runs!
Christoph Thiede uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ct.945.mcz
==================== Summary ====================
Name: Collections-ct.945
Author: ct
Time: 6 May 2021, 10:08:28.643835 pm
UUID: 4a526dd8-6418-c44f-aa41-3de63a54b393
Ancestors: Collections-mt.943
Makes HtmlReadWriter robust against HTML5 void tags. As opposed to XHTML tags, they need to be closed manually.
In the past, HTML strings such as the following failed with an "error: this stack is empty":
'<img src="code://Form fromDisplay: World bounds"></img>' asTextFromHtml
This problem is now solved by ignoring void tags in #processEndTag:.
=============== Diff against Collections-mt.943 ===============
Item was added:
+ ----- Method: HtmlReadWriter>>isVoidTag: (in category 'testing') -----
+ isVoidTag: aTag
+
+ ^ self voidTags includes: aTag!
Item was changed:
----- Method: HtmlReadWriter>>processEndTag: (in category 'reading') -----
processEndTag: aTag
| index tagName |
index := count - offset.
tagName := aTag copyFrom: 3 to: aTag size - 1.
+
-
(self isTagIgnored: tagName) ifTrue: [^ self].
+ (self isVoidTag: tagName) ifTrue: [^ self].
tagName = 'code' ifTrue: [self mapCloseCodeTag].
tagName = 'pre' ifTrue: [self breakLines: true].
+
-
self processRunStackTop.
+
-
runStack pop.
+ runStack top at: 2 put: index + 1!
- runStack top at: 2 put: index + 1.!
Item was removed:
- ----- Method: HtmlReadWriter>>processEndTagEagerly: (in category 'reading') -----
- processEndTagEagerly: aTag
- "Not all tags need an end tag. Simulate that here."
-
- (aTag beginsWith: '<img')
- ifTrue: [^ self processEndTag: '</img>'].!
Item was changed:
----- Method: HtmlReadWriter>>processStartTag: (in category 'reading') -----
processStartTag: aTag
+ | tagName index |
+ tagName := (aTag copyWithoutAll: '</>') copyUpTo: Character space.
+ (self isTagIgnored: tagName) ifTrue: [^ self].
+
- | index |
- (self isTagIgnored: aTag) ifTrue: [^ self].
-
index := count - offset.
+
+ tagName = 'br' ifTrue: [
-
- aTag = '<br>' ifTrue: [
self addCharacter: Character cr.
^ self].
+
+ tagName = 'img' ifTrue: [
-
- (aTag beginsWith: '<img') ifTrue: [
self addString: Character startOfHeader asString.
offset := offset + 1.
index := index - 1].
+ self processRunStackTop.
+ "To add all attributes before the next tag adds some."
- self processRunStackTop. "To add all attributes before the next tag adds some."
-
"Copy attr list and add new attr."
+ runStack push: {
+ runStack top first copy
+ addAll: (self mapTagToAttribute: aTag);
+ yourself.
+ index + 1.
+ index + 1}.
+
+ "For void tags such as <img>, we should simulate the closing tag because in case of HTML5 there won't be any."
+ (self isVoidTag: tagName) ifTrue: [self processEndTag: tagName]!
- runStack push: ({runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself. index + 1 . index + 1}).
-
- "For tags such as <img>, we should simulate the closing tag because there won't be any."
- self processEndTagEagerly: aTag.!
Item was added:
+ ----- Method: HtmlReadWriter>>voidTags (in category 'accessing') -----
+ voidTags
+ "Tags that are empty and won't be closed in HTML5."
+
+ ^ #(#img)!
Christoph Thiede uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ct.944.mcz
==================== Summary ====================
Name: Collections-ct.944
Author: ct
Time: 6 May 2021, 7:19:19.626252 pm
UUID: ff2ccbdf-62a6-9040-ba1a-5c7ae932a1c9
Ancestors: Collections-mt.943
Fixes a slip in HtmlReadWriter when encountering an empty CSS value.
Note that according to W3C, empty CSS values are not permitted, but let's not make our converter fail for such a trivial reason, in particular since I met such a tag in the wild. Also, the check already exists anyway. :-)
=============== Diff against Collections-mt.943 ===============
Item was changed:
----- Method: HtmlReadWriter>>mapContainerTag: (in category 'mapping') -----
mapContainerTag: aTag
| result styleStart styleEnd styleAttributes |
result := OrderedCollection new.
styleStart := (aTag findString: 'style="' ) + 7.
styleStart <= 7 ifTrue: [^#()].
styleEnd := (aTag findString: '"' startingAt: styleStart) - 1.
styleAttributes := (aTag copyFrom: styleStart to: styleEnd) subStrings: ';'.
styleAttributes do: [:ea | |keyValue key value|
keyValue := (ea subStrings: ':') collect: [:s | s withBlanksTrimmed].
key := keyValue first asLowercase.
- value := keyValue second.
keyValue size = 2 ifTrue: [
+ value := keyValue second.
key = 'color' ifTrue: [result add: (TextColor color: (Color fromString: value))].
(key beginsWith: 'font') ifTrue: [
(value includesSubstring: 'bold')
ifTrue: [result add: TextEmphasis bold].
(value includesSubstring: 'italic')
ifTrue: [result add: TextEmphasis italic]]]].
^ result!