[squeak-dev] Review Request: url-fragment-encoding.1.cs

christoph.thiede at student.hpi.uni-potsdam.de christoph.thiede at student.hpi.uni-potsdam.de
Fri Nov 25 19:49:27 UTC 2022


Please object within one week before I merge this into the Trunk. Support for text fragments would be pretty useful for SqueakInboxTalk. :-)

Best,
Christoph

=============== Summary ===============

Change Set:        url-fragment-encoding
Date:            25 November 2022
Author:            Christoph Thiede

Fixes encoding and decoding of URL fragments and adds rough support for fragment directives. Adds tests.

Concretely, this patch enables the following things:
* parse an encoded URL fragment like #See%20also (previously, the fragment was not decoded but encoded again)
* print a URL with fragment directive (previously, the directive prefix was also encoded, and text directives were encoded incorrectly)

Adds new accessors for fragment parts. Note that there is only rough support for directives at the moment, i.e., we don't do a deep parse of the directive but only make sure that we don't break the directive during decoding and reencoding (see #testAbsoluteHTTP). Improved support might follow in the future, but for now, just let's make it possible to open a URL with fragment directive in your browser. For instance, SqueakInboxTalk/ExternalWebBrowser needs this. Baby steps. ;-). For more details, see: https://wicg.github.io/scroll-to-text-fragment/

=============== Diff ===============

FileUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)
printOn: aStream
    "Return the FileUrl according to RFC1738 plus supporting fragments:
        'file://<host>/<path>#<fragment>'
    Note that <host> being '' is equivalent to 'localhost'.
    Note: The pathString can not start with a leading $/
    to indicate an 'absolute' file path.
    This is not according to RFC1738 where the path should have
    no leading or trailing slashes, and always
    be considered absolute relative to the filesystem."

    aStream nextPutAll: self schemeName, '://'.

    host ifNotNil: [aStream nextPutAll: host].

    aStream
        nextPut: $/;
        nextPutAll: self pathString.

-     fragment ifNotNil:
-         [aStream
-             nextPut: $#;
-             nextPutAll: fragment encodeForHTTP].
+     self printFragmentOn: aStream.

GenericUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)
printOn: aStream

    aStream nextPutAll: self schemeName.
    aStream nextPut: $:.
    aStream nextPutAll: self locator.

-     self fragment ifNotNil:
-         [aStream nextPut: $#.
-         aStream nextPutAll: self fragment].
+     self printFragmentOn: aStream.

HierarchicalUrl>>printOn: {printing} · ct 11/25/2022 19:29 (changed)
printOn: aStream

    aStream nextPutAll: self schemeName.
    aStream nextPutAll: '://'.
    self username ifNotNil: [
        aStream nextPutAll: self username encodeForHTTP.
        self password ifNotNil: [
            aStream nextPutAll: ':'.
            aStream nextPutAll: self password encodeForHTTP].
        aStream nextPutAll: '@' ].
    aStream nextPutAll: self authority.
    port ifNotNil: [aStream nextPut: $:; print: port].
    path do: [ :pathElem |
        aStream nextPut: $/.
        aStream nextPutAll: pathElem encodeForHTTP. ].
    self query isNil ifFalse: [ 
        aStream nextPut: $?.
        aStream nextPutAll: self query. ].
-     self fragment isNil ifFalse: [
-         aStream nextPut: $#.
-         aStream nextPutAll: self fragment encodeForHTTP. ].
+     self printFragmentOn: aStream.

Url class>>absoluteFromText: {parsing} · ct 11/25/2022 19:03 (changed)
absoluteFromText: aString
    "Return a URL from a string and handle
    a String without a scheme as a HttpUrl."

    "Url absoluteFromText: 'http://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#part'" 
    "Url absoluteFromText: 'msw://chaos.resnet.gatech.edu:9000/testbook?top'"
    "Url absoluteFromText: 'telnet:chaos.resnet.gatech.edu'"
    "Url absoluteFromText: 'file:/etc/passwd'"

    | remainder index scheme fragment newUrl |
    "trim surrounding whitespace"
    remainder := aString withBlanksTrimmed.    

    "extract the fragment, if any"
    index := remainder indexOf: $#.
    index > 0 ifTrue: [
-         fragment := remainder copyFrom: index + 1 to: remainder size.
+         fragment := (remainder copyFrom: index + 1 to: remainder size) unescapePercents.
        remainder := remainder copyFrom: 1 to: index - 1].

    "choose class based on the scheme name, and let that class do the bulk of the parsing"
    scheme := self schemeNameForString: remainder.
    newUrl := (self urlClassForScheme: scheme) new privateInitializeFromText: remainder.
    newUrl privateFragment: fragment.
    ^newUrl

Url>>fragmentDirective {fragment} · ct 11/25/2022 19:52
+ fragmentDirective
+ 
+     ^ self
+         splitFragmentDirectiveDo: [:fragment :directive | directive]
+         otherwise: [:fragment | nil]

Url>>fragmentDirectivePrefix {private} · ct 11/25/2022 20:40
+ fragmentDirectivePrefix
+     "See comment in #splitFragmentDirectiveDo:otherwise:."
+ 
+     ^ ':~:'

Url>>fragmentWithoutDirective {fragment} · ct 11/25/2022 19:51
+ fragmentWithoutDirective
+ 
+     ^ self
+         splitFragmentDirectiveDo: [:fragment :directive | fragment]
+         otherwise: [:fragment | fragment]

Url>>newFromRelativeText: {parsing} · ct 11/25/2022 19:07 (changed)
newFromRelativeText: aString
    "return a URL relative to the current one, given by aString.  For instance, if self is 'http://host/dir/file', and aString is '/dir2/file2', then the return will be a Url for 'http://host/dir2/file2'"

    "if the scheme is the same, or not specified, then use the same class"

    | newSchemeName remainder fragmentStart newFragment newUrl bare |

    bare := aString withBlanksTrimmed.
    newSchemeName := Url schemeNameForString: bare.
    (newSchemeName isNil not and: [ newSchemeName ~= self schemeName ]) ifTrue: [
        "different scheme -- start from scratch"
        ^Url absoluteFromText: aString ].

    remainder := bare.

    "remove the fragment, if any"
    fragmentStart := remainder indexOf: $#.
    fragmentStart > 0 ifTrue: [
-         newFragment := remainder copyFrom: fragmentStart+1 to: remainder size. 
+         newFragment := (remainder copyFrom: fragmentStart+1 to: remainder size) unescapePercents. 
        remainder := remainder copyFrom: 1 to: fragmentStart-1].

    "remove the scheme name"
    newSchemeName ifNotNil: [
        remainder := remainder copyFrom: (newSchemeName size + 2) to: remainder size ].

    "create and initialize the new url"
    newUrl := self class new privateInitializeFromText: remainder  relativeTo: self.


    "set the fragment"
    newUrl privateFragment: newFragment.


    ^newUrl

Url>>printFragmentOn: {printing} · ct 11/25/2022 20:42
+ printFragmentOn: aStream
+ 
+     self fragment ifNil: [^ self].
+     aStream nextPut: $#.
+     
+     self
+         splitFragmentDirectiveDo: [:fragmentWithoutDirective :directive |
+             | index |
+             aStream
+                 nextPutAll: fragmentWithoutDirective encodeForHTTP;
+                 nextPutAll: self fragmentDirectivePrefix "do not encode!".
+             (index := directive indexOf: $=) > 0
+                 ifTrue: [ "TextDirective"
+                     aStream
+                         nextPutAll: (directive first: index) "do not encode!";
+                         nextPutAll: ((directive allButFirst: index)
+                             encodeForHTTPWithTextEncoding: 'utf-8'
+                             conditionBlock: [:character |
+                                 "TextDirectiveExplicitChar"
+                                 character isSafeForHTTP and: [('&-,' includes: character) not]])]
+                 ifFalse: [ "UnknownDirective"
+                     aStream nextPutAll: directive encodeForHTTP]]
+         otherwise: [:totalFragment |
+             aStream nextPutAll: totalFragment encodeForHTTP].

Url>>splitFragmentDirectiveDo:otherwise: {fragment} · ct 11/25/2022 20:40
+ splitFragmentDirectiveDo: fragmentDirectiveBlock otherwise: fragmentBlock
+     "Search the fragment for a directive. If one was found, evaluate fragmentDirectiveBlock with the fragment and the directive separated; if the fragment exists but does not have a directive, evaluate fragmentBlock instead. Note that there is only rough support for directives at the moment, i.e., we don't do a deep parse of the directive but only make sure that we don't break the directive during decoding and reencoding (see #testAbsoluteHTTP).
+     
+     For more information on fragment directives, see: https://wicg.github.io/scroll-to-text-fragment/"
+ 
+     | directiveIndex |
+     fragment ifNil: [^ nil].
+     
+     directiveIndex := fragment findString: self fragmentDirectivePrefix.
+     directiveIndex = 0 ifTrue: [^ fragmentBlock value: fragment].
+     ^ fragmentDirectiveBlock
+         value: (fragment first: directiveIndex - 1)
+         value: (fragment allButFirst: directiveIndex - 1 + self fragmentDirectivePrefix size)

UrlTest>>testAbsoluteHTTP {tests - absolute urls} · ct 11/25/2022 20:34 (changed)
testAbsoluteHTTP
    
-     url := 'hTTp://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#part' asUrl.
+     url := 'hTTp://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#a%20part :~:text=text-start%20' asUrl.

    self assert: url schemeName = 'http'.
    self assert: url authority = 'chaos.resnet.gatech.edu'.
    self assert: url path first = 'docs'.
    self assert: url path size = 3.
    self assert: url query = 'A%20query%20'.
-     self assert: url fragment = 'part'.
+     self assert: url fragment = 'a part :~:text=text-start '.
+     self assert: url fragmentWithoutDirective = 'a part '.
+     self assert: url fragmentDirective = 'text=text-start '.
+     
+     self assert: url asString = 'http://chaos.resnet.gatech.edu:8000/docs/java/index.html?A%20query%20#a%20part%20:~:text=text%2Dstart%20'.

UrlTest>>testRelativeHTTP {tests - relative} · ct 11/25/2022 20:03 (changed)
testRelativeHTTP
    
    baseUrl := 'http://some.where/some/dir?query1#fragment1' asUrl.
-     url := baseUrl newFromRelativeText: '../another/dir/?query2#fragment2'.
+     url := baseUrl newFromRelativeText: '../another/dir/?query%202#fragment%202'.

-     self assert: url asString =  'http://some.where/another/dir/?query2#fragment2'.
+     self assert: url asString =  'http://some.where/another/dir/?query%202#fragment%202'.

---
Sent from Squeak Inbox Talk
["url-fragment-encoding.1.cs"]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20221125/c2047d9a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: url-fragment-encoding.1.cs
Type: application/octet-stream
Size: 8611 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20221125/c2047d9a/attachment.obj>


More information about the Squeak-dev mailing list