Please object within one week before I merge this into the Trunk. Support for text fragments would be pretty useful for SqueakInboxTalk. :-)<br>
<br>
Best,<br>
Christoph<br>
<br>
<b>=============== Summary ===============</b><br>
<br>
Change Set: url-fragment-encoding<br>
Date: 25 November 2022<br>
Author: Christoph Thiede<br>
<br>
Fixes encoding and decoding of URL fragments and adds rough support for fragment directives. Adds tests.<br>
<br>
Concretely, this patch enables the following things:<br>
* parse an encoded URL fragment like #See%20also (previously, the fragment was not decoded but encoded again)<br>
* print a URL with fragment directive (previously, the directive prefix was also encoded, and text directives were encoded incorrectly)<br>
<br>
Adds new accessors for fragment parts. Note that there is only rough support for directives at the moment, i.e., we don't do a deep parse of the directive but only make sure that we don't break the directive during decoding and reencoding (see #testAbsoluteHTTP). Improved support might follow in the future, but for now, just let's make it possible to open a URL with fragment directive in your browser. For instance, SqueakInboxTalk/ExternalWebBrowser needs this. Baby steps. ;-). For more details, see: https://wicg.github.io/scroll-to-text-fragment/<br>
<br>
<b>=============== Diff ===============</b><br>
<br>
<b>FileUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)</b><br>
printOn: aStream<br>
"Return the FileUrl according to RFC1738 plus supporting fragments:<br>
'file://<host>/<path>#<fragment>'<br>
Note that <host> being '' is equivalent to 'localhost'.<br>
Note: The pathString can not start with a leading $/<br>
to indicate an 'absolute' file path.<br>
This is not according to RFC1738 where the path should have<br>
no leading or trailing slashes, and always<br>
be considered absolute relative to the filesystem."<br>
<br>
aStream nextPutAll: self schemeName, '://'.<br>
<br>
host ifNotNil: [aStream nextPutAll: host].<br>
<br>
aStream<br>
nextPut: $/;<br>
nextPutAll: self pathString.<br>
<br>
<s><font color="#0000FF">- fragment ifNotNil:<br>
- [aStream<br>
- nextPut: $#;<br>
- nextPutAll: fragment encodeForHTTP].<br>
</font></s><font color="#FF0000">+ self printFragmentOn: aStream.</font><br>
<br>
<b>GenericUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)</b><br>
printOn: aStream<br>
<br>
aStream nextPutAll: self schemeName.<br>
aStream nextPut: $:.<br>
aStream nextPutAll: self locator.<br>
<br>
<s><font color="#0000FF">- self fragment ifNotNil:<br>
- [aStream nextPut: $#.<br>
- aStream nextPutAll: self fragment].<br>
</font></s><font color="#FF0000">+ self printFragmentOn: aStream.</font><br>
<br>
<b>HierarchicalUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)</b><br>
printOn: aStream<br>
<br>
aStream nextPutAll: self schemeName.<br>
aStream nextPutAll: '://'.<br>
self username ifNotNil: [<br>
aStream nextPutAll: self username encodeForHTTP.<br>
self password ifNotNil: [<br>
aStream nextPutAll: ':'.<br>
aStream nextPutAll: self password encodeForHTTP].<br>
aStream nextPutAll: '@' ].<br>
aStream nextPutAll: self authority.<br>
port ifNotNil: [aStream nextPut: $:; print: port].<br>
path do: [ :pathElem |<br>
aStream nextPut: $/.<br>
aStream nextPutAll: pathElem encodeForHTTP. ].<br>
self query isNil ifFalse: [ <br>
aStream nextPut: $?.<br>
aStream nextPutAll: self query. ].<br>
<s><font color="#0000FF">- self fragment isNil ifFalse: [<br>
- aStream nextPut: $#.<br>
- aStream nextPutAll: self fragment encodeForHTTP. ].<br>
</font></s><font color="#FF0000">+ self printFragmentOn: aStream.</font><br>
<br>
<b>Url class>>absoluteFromText: {parsing} · ct 11/25/2022 19:03 (changed)</b><br>
absoluteFromText: aString<br>
"Return a URL from a string and handle<br>
a String without a scheme as a HttpUrl."<br>
<br>
"Url absoluteFromText: 'http://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#part'" <br>
"Url absoluteFromText: 'msw://chaos.resnet.gatech.edu:9000/testbook?top'"<br>
"Url absoluteFromText: 'telnet:chaos.resnet.gatech.edu'"<br>
"Url absoluteFromText: 'file:/etc/passwd'"<br>
<br>
| remainder index scheme fragment newUrl |<br>
"trim surrounding whitespace"<br>
remainder := aString withBlanksTrimmed. <br>
<br>
"extract the fragment, if any"<br>
index := remainder indexOf: $#.<br>
index > 0 ifTrue: [<br>
<s><font color="#0000FF">- fragment := remainder copyFrom: index + 1 to: remainder size.<br>
</font></s><font color="#FF0000">+ fragment := (remainder copyFrom: index + 1 to: remainder size) unescapePercents.<br>
</font> remainder := remainder copyFrom: 1 to: index - 1].<br>
<br>
"choose class based on the scheme name, and let that class do the bulk of the parsing"<br>
scheme := self schemeNameForString: remainder.<br>
newUrl := (self urlClassForScheme: scheme) new privateInitializeFromText: remainder.<br>
newUrl privateFragment: fragment.<br>
^newUrl<br>
<br>
<b>Url>>fragmentDirective {fragment} · ct 11/25/2022 19:52</b><br>
<font color="#FF0000">+ fragmentDirective<br>
+ <br>
+ ^ self<br>
+ splitFragmentDirectiveDo: [:fragment :directive | directive]<br>
+ otherwise: [:fragment | nil]</font><br>
<br>
<b>Url>>fragmentDirectivePrefix {private} · ct 11/25/2022 20:40</b><br>
<font color="#FF0000">+ fragmentDirectivePrefix<br>
+ "See comment in #splitFragmentDirectiveDo:otherwise:."<br>
+ <br>
+ ^ ':~:'</font><br>
<br>
<b>Url>>fragmentWithoutDirective {fragment} · ct 11/25/2022 19:51</b><br>
<font color="#FF0000">+ fragmentWithoutDirective<br>
+ <br>
+ ^ self<br>
+ splitFragmentDirectiveDo: [:fragment :directive | fragment]<br>
+ otherwise: [:fragment | fragment]</font><br>
<br>
<b>Url>>newFromRelativeText: {parsing} · ct 11/25/2022 19:07 (changed)</b><br>
newFromRelativeText: aString<br>
"return a URL relative to the current one, given by aString. For instance, if self is 'http://host/dir/file', and aString is '/dir2/file2', then the return will be a Url for 'http://host/dir2/file2'"<br>
<br>
"if the scheme is the same, or not specified, then use the same class"<br>
<br>
| newSchemeName remainder fragmentStart newFragment newUrl bare |<br>
<br>
bare := aString withBlanksTrimmed.<br>
newSchemeName := Url schemeNameForString: bare.<br>
(newSchemeName isNil not and: [ newSchemeName ~= self schemeName ]) ifTrue: [<br>
"different scheme -- start from scratch"<br>
^Url absoluteFromText: aString ].<br>
<br>
remainder := bare.<br>
<br>
"remove the fragment, if any"<br>
fragmentStart := remainder indexOf: $#.<br>
fragmentStart > 0 ifTrue: [<br>
<s><font color="#0000FF">- newFragment := remainder copyFrom: fragmentStart+1 to: remainder size. <br>
</font></s><font color="#FF0000">+ newFragment := (remainder copyFrom: fragmentStart+1 to: remainder size) unescapePercents. <br>
</font> remainder := remainder copyFrom: 1 to: fragmentStart-1].<br>
<br>
"remove the scheme name"<br>
newSchemeName ifNotNil: [<br>
remainder := remainder copyFrom: (newSchemeName size + 2) to: remainder size ].<br>
<br>
"create and initialize the new url"<br>
newUrl := self class new privateInitializeFromText: remainder relativeTo: self.<br>
<br>
<br>
"set the fragment"<br>
newUrl privateFragment: newFragment.<br>
<br>
<br>
^newUrl<br>
<br>
<b>Url>>printFragmentOn: {printing} · ct 11/25/2022 20:42</b><br>
<font color="#FF0000">+ printFragmentOn: aStream<br>
+ <br>
+ self fragment ifNil: [^ self].<br>
+ aStream nextPut: $#.<br>
+ <br>
+ self<br>
+ splitFragmentDirectiveDo: [:fragmentWithoutDirective :directive |<br>
+ | index |<br>
+ aStream<br>
+ nextPutAll: fragmentWithoutDirective encodeForHTTP;<br>
+ nextPutAll: self fragmentDirectivePrefix "do not encode!".<br>
+ (index := directive indexOf: $=) > 0<br>
+ ifTrue: [ "TextDirective"<br>
+ aStream<br>
+ nextPutAll: (directive first: index) "do not encode!";<br>
+ nextPutAll: ((directive allButFirst: index)<br>
+ encodeForHTTPWithTextEncoding: 'utf-8'<br>
+ conditionBlock: [:character |<br>
+ "TextDirectiveExplicitChar"<br>
+ character isSafeForHTTP and: [('&-,' includes: character) not]])]<br>
+ ifFalse: [ "UnknownDirective"<br>
+ aStream nextPutAll: directive encodeForHTTP]]<br>
+ otherwise: [:totalFragment |<br>
+ aStream nextPutAll: totalFragment encodeForHTTP].</font><br>
<br>
<b>Url>>splitFragmentDirectiveDo:otherwise: {fragment} · ct 11/25/2022 20:40</b><br>
<font color="#FF0000">+ splitFragmentDirectiveDo: fragmentDirectiveBlock otherwise: fragmentBlock<br>
+ "Search the fragment for a directive. If one was found, evaluate fragmentDirectiveBlock with the fragment and the directive separated; if the fragment exists but does not have a directive, evaluate fragmentBlock instead. Note that there is only rough support for directives at the moment, i.e., we don't do a deep parse of the directive but only make sure that we don't break the directive during decoding and reencoding (see #testAbsoluteHTTP).<br>
+ <br>
+ For more information on fragment directives, see: https://wicg.github.io/scroll-to-text-fragment/"<br>
+ <br>
+ | directiveIndex |<br>
+ fragment ifNil: [^ nil].<br>
+ <br>
+ directiveIndex := fragment findString: self fragmentDirectivePrefix.<br>
+ directiveIndex = 0 ifTrue: [^ fragmentBlock value: fragment].<br>
+ ^ fragmentDirectiveBlock<br>
+ value: (fragment first: directiveIndex - 1)<br>
+ value: (fragment allButFirst: directiveIndex - 1 + self fragmentDirectivePrefix size)</font><br>
<br>
<b>UrlTest>>testAbsoluteHTTP {tests - absolute urls} · ct 11/25/2022 20:34 (changed)</b><br>
testAbsoluteHTTP<br>
<br>
<s><font color="#0000FF">- url := 'hTTp://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#part' asUrl.<br>
</font></s><font color="#FF0000">+ url := 'hTTp://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#a%20part :~:text=text-start%20' asUrl.<br>
</font><br>
self assert: url schemeName = 'http'.<br>
self assert: url authority = 'chaos.resnet.gatech.edu'.<br>
self assert: url path first = 'docs'.<br>
self assert: url path size = 3.<br>
self assert: url query = 'A%20query%20'.<br>
<s><font color="#0000FF">- self assert: url fragment = 'part'.<br>
</font></s><font color="#FF0000">+ self assert: url fragment = 'a part :~:text=text-start '.<br>
+ self assert: url fragmentWithoutDirective = 'a part '.<br>
+ self assert: url fragmentDirective = 'text=text-start '.<br>
+ <br>
+ self assert: url asString = 'http://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#a%20part%20:~:text=text%2Dstart%20'.</font><br>
<br>
<b>UrlTest>>testRelativeHTTP {tests - relative} · ct 11/25/2022 20:03 (changed)</b><br>
testRelativeHTTP<br>
<br>
baseUrl := 'http://some.where/some/dir?query1#fragment1' asUrl.<br>
<s><font color="#0000FF">- url := baseUrl newFromRelativeText: '../another/dir/?query2#fragment2'.<br>
</font></s><font color="#FF0000">+ url := baseUrl newFromRelativeText: '../another/dir/?query%202#fragment%202'.<br>
</font><br>
<s><font color="#0000FF">- self assert: url asString = 'http://some.where/another/dir/?query2#fragment2'.<br>
</font></s><font color="#FF0000">+ self assert: url asString = 'http://some.where/another/dir/?query%202#fragment%202'.</font><br>
<br>
<font color="#808080">---<br>
</font><font color="#808080"><i>Sent from </i></font><font color="#808080"><i><a href="https://github.com/hpi-swa-lab/squeak-inbox-talk"><u><font color="#808080">Squeak Inbox Talk</font></u></a></i></font><br>
["url-fragment-encoding.1.cs"]