The Trunk: Collections-pre.857.mcz

List overview All Threads
Download

newer

older

The Inbox: Kernel-jr.1532.mcz

The Inbox: Kernel-jar.1532.mcz

commits＠source.squeak.org

4 Oct 2019 4 Oct '19

9:04 a.m.

Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added: + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- + startOfHeader + + ^ self value: 1 !

Item was added: + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- + scanFrom: strm + "Read the text attribute properties from the stream. When this method has + been called the concrete TextAttribute class has already been selected via + scanCharacter. (see TextAttribute class>>#newFrom:). + For writing the format see TextAttribute>>#writeScanOn:"!

Item was added: + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- + writeScanOn: strm + "Implement this method for a text attribute to define how it it should be written + to a serialized form of a text object. The form should correspond to the source + file format, i.e. use a scan character to denote its subclass. + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. + For reading the written information see TextAttribute class>>#scanFrom:" + + "Do nothing because of abstract class"!

Show replies by date

patrick.rein＠hpi.uni-potsdam.de

4 Oct 4 Oct

11:01 a.m.

Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...

Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

Levente Uzonyi

5 Oct 5 Oct

12:27 p.m.

On Fri, 4 Oct 2019, patrick.rein@hpi.uni-potsdam.de wrote:

...

Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Are there too many senders of these methods that need to identify start of header? Or is it too hard to identify these methods? If the answer is no to both questsions, then I suggest using different method names. E.g.: #isTextSeparator, #textSeparators.

Levente

...

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

christoph.thiede＠student.hpi.uni-potsdam.de

20 Jun 20 Jun

9:51 a.m.

...

Character class>>separators "Answer a collection of the standard ASCII separator characters."
^ #(32 "space"
    13 "cr"
    9 "tab"
    10 "line feed"
    12 "form feed"
    1 "text separator")
    collect: [:v | Character value: v] as: String
This seems simple and clear to me.

+1, just were stumbling upon this missing separator right now, too. See also: http://forum.world.st/ENH-isSeparator-tp5129517.html :-)

Hm, why don't we have #separators depend on #isSeparator and use a cache? It should not be slower but we would save some duplication ...

Best, Christoph

David T. Lewis

4 Oct 4 Oct

1:26 p.m.

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...

Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

Tobias Pape

21 Jun 21 Jun

10:37 a.m.

Hi Dave

...

On 4. Oct 2019, at 15:26, David T. Lewis lewis@mail.msen.com wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string…

Best regards -Tobias

...

There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

Thiede, Christoph

1:47 p.m.

...

It is! but do we really need a collect for this static list?

...

We could put that code in a comment and just return the resulting string…

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Tobias Pape Das.Linux@gmx.de Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...

On 4. Oct 2019, at 15:26, David T. Lewis lewis@mail.msen.com wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string…

Best regards -Tobias

...

There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

Thiede, Christoph

19 Aug 19 Aug

6:11 p.m.

Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...

It is! but do we really need a collect for this static list?

...

We could put that code in a comment and just return the resulting string…

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Hi Dave

...

On 4. Oct 2019, at 15:26, David T. Lewis lewis@mail.msen.com wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string…

Best regards -Tobias

...

There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

David T. Lewis

21 Aug 21 Aug

6:34 p.m.

Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...

Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Tobias Pape Das.Linux@gmx.de Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis lewis@mail.msen.com wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

...

David T. Lewis

26 Aug 26 Aug

9:10 p.m.

Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...

Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Tobias Pape Das.Linux@gmx.de Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis lewis@mail.msen.com wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein@hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

...

...

'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

christoph.thiede＠student.hpi.uni-potsdam.de

30 Aug 30 Aug

9:30 a.m.

Great, thank you, Dave! :-)

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2021-08-26T17:10:40-04:00, lewis@mail.msen.com wrote:

...

Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

Thiede, Christoph

6 Sep 6 Sep

10:55 a.m.

Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev@lists.squeakfoundation.org; lewis@mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

--- Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis@mail.msen.com wrote:

...

Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

...
Patrick Rein uploaded a new version of Collections to project The Trunk: http://source.squeak.org/trunk/Collections-pre.857.mcz

==================== Summary ====================

Name: Collections-pre.857 Author: pre Time: 4 October 2019, 11:04:30.363303 am UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 Ancestors: Collections-pre.856

Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods.

=============== Diff against Collections-pre.856 ===============

Item was added:

----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') -----

startOfHeader

^ self value: 1 !

Item was added:

----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') -----

scanFrom: strm

"Read the text attribute properties from the stream. When this method has

been called the concrete TextAttribute class has already been selected via

scanCharacter. (see TextAttribute class>>#newFrom:).

For writing the format see TextAttribute>>#writeScanOn:"!

Item was added:

----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') -----

writeScanOn: strm

"Implement this method for a text attribute to define how it it should be written

to a serialized form of a text object. The form should correspond to the source

file format, i.e. use a scan character to denote its subclass.

As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan.

For reading the written information see TextAttribute class>>#scanFrom:"

"Do nothing because of abstract class"!

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

David T. Lewis

4:58 p.m.

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...

Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev@lists.squeakfoundation.org; lewis@mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis@mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

> Patrick Rein uploaded a new version of Collections to project The Trunk: > http://source.squeak.org/trunk/Collections-pre.857.mcz > > ==================== Summary ==================== > > Name: Collections-pre.857 > Author: pre > Time: 4 October 2019, 11:04:30.363303 am > UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 > Ancestors: Collections-pre.856 > > Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. > > =============== Diff against Collections-pre.856 =============== > > Item was added: > + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- > + startOfHeader > + > + ^ self value: 1 ! > > Item was added: > + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- > + scanFrom: strm > + "Read the text attribute properties from the stream. When this method has > + been called the concrete TextAttribute class has already been selected via > + scanCharacter. (see TextAttribute class>>#newFrom:). > + For writing the format see TextAttribute>>#writeScanOn:"! > > Item was added: > + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- > + writeScanOn: strm > + "Implement this method for a text attribute to define how it it should be written > + to a serialized form of a text object. The form should correspond to the source > + file format, i.e. use a scan character to denote its subclass. > + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. > + For reading the written information see TextAttribute class>>#scanFrom:" > + > + "Do nothing because of abstract class"! > >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

Thiede, Christoph

5:11 p.m.

Thank you for the fast follow-up! :-)

Best,

Christoph

________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von David T. Lewis lewis@mail.msen.com Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...

Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev@lists.squeakfoundation.org; lewis@mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis@mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote:

...
Hi everyone,

in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand.

Any thoughts about this?

Bests Patrick

P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." (https://tools.ietf.org/html/rfc20#section-5.2)

> Patrick Rein uploaded a new version of Collections to project The Trunk: > http://source.squeak.org/trunk/Collections-pre.857.mcz > > ==================== Summary ==================== > > Name: Collections-pre.857 > Author: pre > Time: 4 October 2019, 11:04:30.363303 am > UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 > Ancestors: Collections-pre.856 > > Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. > > =============== Diff against Collections-pre.856 =============== > > Item was added: > + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- > + startOfHeader > + > + ^ self value: 1 ! > > Item was added: > + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- > + scanFrom: strm > + "Read the text attribute properties from the stream. When this method has > + been called the concrete TextAttribute class has already been selected via > + scanCharacter. (see TextAttribute class>>#newFrom:). > + For writing the format see TextAttribute>>#writeScanOn:"! > > Item was added: > + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- > + writeScanOn: strm > + "Implement this method for a text attribute to define how it it should be written > + to a serialized form of a text object. The form should correspond to the source > + file format, i.e. use a scan character to denote its subclass. > + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. > + For reading the written information see TextAttribute class>>#scanFrom:" > + > + "Do nothing because of abstract class"! > >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

christoph.thiede＠student.hpi.uni-potsdam.de

1 Jun 1 Jun

6:46 p.m.

Hi Patrick, hi all,

even though I called myself for making SOH a separator, it's now falling on my feet:

'Hi! <img src="code://MenuIcons squeakIcon">' asTextFromHtml withBlanksTrimmed becomes 'Hi!'. The image is removed from the text because it is represented as a SOH in the string.

Is this really what we want? See Collections-ct.1039 where I started adding an extra check after sending #isSeparator to work around this. I am no experiencing further issues like that in Squeak Inbox Talk ...

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

How should we fix that issue? Do we really need SOH to be a separator? I would not really consider it satisfying to leave it up to clients to check for #isSeparator AND for SOH. Would it be appropriate to change #withBlanksTrimmed et al. on Text? Then, myText first isSeparator would still be confusing. Do we have to have multiple variants of separators as discussed earlier?

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2021-09-06T17:11:05+00:00, christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

Thank you for the fast follow-up! :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von David T. Lewis <lewis(a)mail.msen.com> Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...
Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev(a)lists.squeakfoundation.org; lewis(a)mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis(a)mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote: > Hi everyone, > > in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand. > > Any thoughts about this? > > Bests > Patrick > > P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." > (https://tools.ietf.org/html/rfc20#section-5.2) > >> Patrick Rein uploaded a new version of Collections to project The Trunk: >> http://source.squeak.org/trunk/Collections-pre.857.mcz >> >> ==================== Summary ==================== >> >> Name: Collections-pre.857 >> Author: pre >> Time: 4 October 2019, 11:04:30.363303 am >> UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 >> Ancestors: Collections-pre.856 >> >> Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. >> >> =============== Diff against Collections-pre.856 =============== >> >> Item was added: >> + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- >> + startOfHeader >> + >> + ^ self value: 1 ! >> >> Item was added: >> + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- >> + scanFrom: strm >> + "Read the text attribute properties from the stream. When this method has >> + been called the concrete TextAttribute class has already been selected via >> + scanCharacter. (see TextAttribute class>>#newFrom:). >> + For writing the format see TextAttribute>>#writeScanOn:"! >> >> Item was added: >> + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- >> + writeScanOn: strm >> + "Implement this method for a text attribute to define how it it should be written >> + to a serialized form of a text object. The form should correspond to the source >> + file format, i.e. use a scan character to denote its subclass. >> + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. >> + For reading the written information see TextAttribute class>>#scanFrom:" >> + >> + "Do nothing because of abstract class"! >> >> >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

Rein, Patrick

2 Jun 2 Jun

5:21 a.m.

Technically it should be a control character, so we might add them as a new concept to Characters. This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

...

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Thursday, June 1, 2023 8:46:45 PM To: squeak-dev@lists.squeakfoundation.org; Rein, Patrick Subject: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Patrick, hi all,

even though I called myself for making SOH a separator, it's now falling on my feet:

'Hi! <img src="code://MenuIcons squeakIcon">' asTextFromHtml withBlanksTrimmed becomes 'Hi!'. The image is removed from the text because it is represented as a SOH in the string.

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2021-09-06T17:11:05+00:00, christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

Thank you for the fast follow-up! :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von David T. Lewis <lewis(a)mail.msen.com> Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...
Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev(a)lists.squeakfoundation.org; lewis(a)mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis(a)mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

...
It is! but do we really need a collect for this static list?

...
We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

...
On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote:

+1

This sounds like a good approach to me. If I understand correctly, it amounts to this:

Character class>>separators "Answer a collection of the standard ASCII separator characters."

^ #(32 "space" 13 "cr" 9 "tab" 10 "line feed" 12 "form feed" 1 "text separator") collect: [:v | Character value: v] as: String

This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

...
There are a lot of senders of #separators in the image, so it is possible that it might have some unintended side effect. But that seems unlikely.

Dave

On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote: > Hi everyone, > > in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand. > > Any thoughts about this? > > Bests > Patrick > > P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." > (https://tools.ietf.org/html/rfc20#section-5.2) > >> Patrick Rein uploaded a new version of Collections to project The Trunk: >> http://source.squeak.org/trunk/Collections-pre.857.mcz >> >> ==================== Summary ==================== >> >> Name: Collections-pre.857 >> Author: pre >> Time: 4 October 2019, 11:04:30.363303 am >> UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 >> Ancestors: Collections-pre.856 >> >> Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. >> >> =============== Diff against Collections-pre.856 =============== >> >> Item was added: >> + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- >> + startOfHeader >> + >> + ^ self value: 1 ! >> >> Item was added: >> + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- >> + scanFrom: strm >> + "Read the text attribute properties from the stream. When this method has >> + been called the concrete TextAttribute class has already been selected via >> + scanCharacter. (see TextAttribute class>>#newFrom:). >> + For writing the format see TextAttribute>>#writeScanOn:"! >> >> Item was added: >> + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- >> + writeScanOn: strm >> + "Implement this method for a text attribute to define how it it should be written >> + to a serialized form of a text object. The form should correspond to the source >> + file format, i.e. use a scan character to denote its subclass. >> + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. >> + For reading the written information see TextAttribute class>>#scanFrom:" >> + >> + "Do nothing because of abstract class"! >> >> >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

christoph.thiede＠student.hpi.uni-potsdam.de

4:07 p.m.

...

This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

What places are you thinking of? :-)

...

...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Thank you! I did that, but everything still looks fine to me. At a first glance, I also was unable to find a bug in Babylonian. It would be great if you had another pointer for me to any behavior that depends on SOH being detected as a separator. :-)

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2023-06-02T05:21:16+00:00, patrick.rein@hpi.de wrote:

...

Technically it should be a control character, so we might add them as a new concept to Characters. This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Thursday, June 1, 2023 8:46:45 PM To: squeak-dev(a)lists.squeakfoundation.org; Rein, Patrick Subject: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Patrick, hi all,

even though I called myself for making SOH a separator, it's now falling on my feet:

'Hi! <img src="code://MenuIcons squeakIcon">' asTextFromHtml withBlanksTrimmed becomes 'Hi!'. The image is removed from the text because it is represented as a SOH in the string.

Is this really what we want? See Collections-ct.1039 where I started adding an extra check after sending #isSeparator to work around this. I am no experiencing further issues like that in Squeak Inbox Talk ...

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

How should we fix that issue? Do we really need SOH to be a separator? I would not really consider it satisfying to leave it up to clients to check for #isSeparator AND for SOH. Would it be appropriate to change #withBlanksTrimmed et al. on Text? Then, myText first isSeparator would still be confusing. Do we have to have multiple variants of separators as discussed earlier?

Best, Christoph

Sent from Squeak Inbox Talk

On 2021-09-06T17:11:05+00:00, christoph.thiede(a)student.hpi.uni-potsdam.de wrote:

...
Thank you for the fast follow-up! :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von David T. Lewis <lewis(a)mail.msen.com> Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...
Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev(a)lists.squeakfoundation.org; lewis(a)mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis(a)mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

> It is! but do we really need a collect for this static list?

> We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

> On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote: > > +1 > > This sounds like a good approach to me. If I understand correctly, it > amounts to this: > > Character class>>separators > "Answer a collection of the standard ASCII separator characters." > > ^ #(32 "space" > 13 "cr" > 9 "tab" > 10 "line feed" > 12 "form feed" > 1 "text separator") > collect: [:v | Character value: v] as: String > > This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

> > There are a lot of senders of #separators in the image, so it is > possible that it might have some unintended side effect. But that > seems unlikely. > > Dave > > > On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote: >> Hi everyone, >> >> in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand. >> >> Any thoughts about this? >> >> Bests >> Patrick >> >> P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." >> (https://tools.ietf.org/html/rfc20#section-5.2) >> >>> Patrick Rein uploaded a new version of Collections to project The Trunk: >>> http://source.squeak.org/trunk/Collections-pre.857.mcz >>> >>> ==================== Summary ==================== >>> >>> Name: Collections-pre.857 >>> Author: pre >>> Time: 4 October 2019, 11:04:30.363303 am >>> UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 >>> Ancestors: Collections-pre.856 >>> >>> Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. >>> >>> =============== Diff against Collections-pre.856 =============== >>> >>> Item was added: >>> + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- >>> + startOfHeader >>> + >>> + ^ self value: 1 ! >>> >>> Item was added: >>> + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- >>> + scanFrom: strm >>> + "Read the text attribute properties from the stream. When this method has >>> + been called the concrete TextAttribute class has already been selected via >>> + scanCharacter. (see TextAttribute class>>#newFrom:). >>> + For writing the format see TextAttribute>>#writeScanOn:"! >>> >>> Item was added: >>> + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- >>> + writeScanOn: strm >>> + "Implement this method for a text attribute to define how it it should be written >>> + to a serialized form of a text object. The form should correspond to the source >>> + file format, i.e. use a scan character to denote its subclass. >>> + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. >>> + For reading the written information see TextAttribute class>>#scanFrom:" >>> + >>> + "Do nothing because of abstract class"! >>> >>> >> >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

Rein, Patrick

5 Jun 5 Jun

7:36 a.m.

...

Thank you! I did that, but everything still looks fine to me. At a first glance, I also was unable to find a bug in Babylonian. It would be great if you had another pointer for me to any behavior that depends on SOH being detected as a separator. :-)

Well, that is kind of awkward now :D Babylonian does not use SOH anymore but a minor tweak to the Scanners that allows me to place Annotations without an SOH. The rationale being that the annotations are not text elements but decorations to it, so they should not be represented in the character stream.

That means that Babylonian will not be affected either way by the change.

I just tried removing SOH from Character class>>#separators and it does break when using the example in the class comment of TextAnchor, so the issue is still there.

...

...
This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

What places are you thinking of? :-)

My sentence was inexact: I do not know how many places would need to check for that, but I was generally referring to the problem that clients now have to distinguish between the two (in line with your opinion). But in fact, I do not know how many places would be affected. The number of senders of #separator and #nonSeparators is not too overwhelming honestly.

I still see the issue of blurring the line between separator and controlCharacter though due to the ambivalent nature of CR or LF.

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Friday, June 2, 2023 6:07:47 PM To: squeak-dev@lists.squeakfoundation.org; Rein, Patrick Subject: Re: The Trunk: Collections-pre.857.mcz

...

This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

What places are you thinking of? :-)

...

...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2023-06-02T05:21:16+00:00, patrick.rein@hpi.de wrote:

...

Technically it should be a control character, so we might add them as a new concept to Characters. This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Thursday, June 1, 2023 8:46:45 PM To: squeak-dev(a)lists.squeakfoundation.org; Rein, Patrick Subject: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Patrick, hi all,

even though I called myself for making SOH a separator, it's now falling on my feet:

'Hi! <img src="code://MenuIcons squeakIcon">' asTextFromHtml withBlanksTrimmed becomes 'Hi!'. The image is removed from the text because it is represented as a SOH in the string.

Is this really what we want? See Collections-ct.1039 where I started adding an extra check after sending #isSeparator to work around this. I am no experiencing further issues like that in Squeak Inbox Talk ...

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

How should we fix that issue? Do we really need SOH to be a separator? I would not really consider it satisfying to leave it up to clients to check for #isSeparator AND for SOH. Would it be appropriate to change #withBlanksTrimmed et al. on Text? Then, myText first isSeparator would still be confusing. Do we have to have multiple variants of separators as discussed earlier?

Best, Christoph

Sent from Squeak Inbox Talk

On 2021-09-06T17:11:05+00:00, christoph.thiede(a)student.hpi.uni-potsdam.de wrote:

...
Thank you for the fast follow-up! :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von David T. Lewis <lewis(a)mail.msen.com> Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...
Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev(a)lists.squeakfoundation.org; lewis(a)mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis(a)mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote:

...
Hi all,

two years later, I still would love to see Patrick's proposal being accepted in the Trunk.

My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too).

I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change.

Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-)

Best, Christoph

PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html

Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 21. Juni 2021 15:47:49 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

> It is! but do we really need a collect for this static list?

> We could put that code in a comment and just return the resulting string?

My suggestion would be

http://www.hpi.de/

^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]]

Then we won't need to duplicate the logic.

Best, Christoph ________________________________ Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> Gesendet: Montag, 21. Juni 2021 12:37:13 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Dave

> On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote: > > +1 > > This sounds like a good approach to me. If I understand correctly, it > amounts to this: > > Character class>>separators > "Answer a collection of the standard ASCII separator characters." > > ^ #(32 "space" > 13 "cr" > 9 "tab" > 10 "line feed" > 12 "form feed" > 1 "text separator") > collect: [:v | Character value: v] as: String > > This seems simple and clear to me.

It is! but do we really need a collect for this static list? We could put that code in a comment and just return the resulting string?

Best regards -Tobias

> > There are a lot of senders of #separators in the image, so it is > possible that it might have some unintended side effect. But that > seems unlikely. > > Dave > > > On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote: >> Hi everyone, >> >> in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand. >> >> Any thoughts about this? >> >> Bests >> Patrick >> >> P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." >> (https://tools.ietf.org/html/rfc20#section-5.2) >> >>> Patrick Rein uploaded a new version of Collections to project The Trunk: >>> http://source.squeak.org/trunk/Collections-pre.857.mcz >>> >>> ==================== Summary ==================== >>> >>> Name: Collections-pre.857 >>> Author: pre >>> Time: 4 October 2019, 11:04:30.363303 am >>> UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 >>> Ancestors: Collections-pre.856 >>> >>> Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. >>> >>> =============== Diff against Collections-pre.856 =============== >>> >>> Item was added: >>> + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- >>> + startOfHeader >>> + >>> + ^ self value: 1 ! >>> >>> Item was added: >>> + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- >>> + scanFrom: strm >>> + "Read the text attribute properties from the stream. When this method has >>> + been called the concrete TextAttribute class has already been selected via >>> + scanCharacter. (see TextAttribute class>>#newFrom:). >>> + For writing the format see TextAttribute>>#writeScanOn:"! >>> >>> Item was added: >>> + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- >>> + writeScanOn: strm >>> + "Implement this method for a text attribute to define how it it should be written >>> + to a serialized form of a text object. The form should correspond to the source >>> + file format, i.e. use a scan character to denote its subclass. >>> + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. >>> + For reading the written information see TextAttribute class>>#scanFrom:" >>> + >>> + "Do nothing because of abstract class"! >>> >>> >> >

...
...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

christoph.thiede＠student.hpi.uni-potsdam.de

9 Nov 9 Nov

10:24 p.m.

Hi Patrick,

sorry for the huge delay, my inbox is chaos :(

...

I just tried removing SOH from Character class>>#separators and it does break when using the example in the class comment of TextAnchor, so the issue is still there.

I cannot reproduce that, unfortunately. After patching Character class>>#separators and also resetting the caches in CharacterSet class>>#separators and CharacterSet class>>#nonSeparators, both examples from TextAnchor comment still seem to work for me. How do they look for you?

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2023-06-05T07:36:56+00:00, patrick.rein@hpi.de wrote:

...

...
Thank you! I did that, but everything still looks fine to me. At a first glance, I also was unable to find a bug in Babylonian. It would be great if you had another pointer for me to any behavior that depends on SOH being detected as a separator. :-)

Well, that is kind of awkward now :D Babylonian does not use SOH anymore but a minor tweak to the Scanners that allows me to place Annotations without an SOH. The rationale being that the annotations are not text elements but decorations to it, so they should not be represented in the character stream.

That means that Babylonian will not be affected either way by the change.

I just tried removing SOH from Character class>>#separators and it does break when using the example in the class comment of TextAnchor, so the issue is still there.

...
...
This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

What places are you thinking of? :-)

My sentence was inexact: I do not know how many places would need to check for that, but I was generally referring to the problem that clients now have to distinguish between the two (in line with your opinion). But in fact, I do not know how many places would be affected. The number of senders of #separator and #nonSeparators is not too overwhelming honestly.

I still see the issue of blurring the line between separator and controlCharacter though due to the ambivalent nature of CR or LF.

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Friday, June 2, 2023 6:07:47 PM To: squeak-dev(a)lists.squeakfoundation.org; Rein, Patrick Subject: Re: The Trunk: Collections-pre.857.mcz

...
This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

What places are you thinking of? :-)

...
...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Thank you! I did that, but everything still looks fine to me. At a first glance, I also was unable to find a bug in Babylonian. It would be great if you had another pointer for me to any behavior that depends on SOH being detected as a separator. :-)

Best, Christoph

Sent from Squeak Inbox Talk

On 2023-06-02T05:21:16+00:00, patrick.rein(a)hpi.de wrote:

...
Technically it should be a control character, so we might add them as a new concept to Characters. This would still lead to a bunch of places having to check for separators and control characters, but at least it would be less specific than just SOH.

...
Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

I guess that one of the cached CharacterSets still has SOH as a separator listed. You will have to reinitialize CharacterSet>>#separators and #nonSeparators. Also I remember that you had to do `Scanner initializeTypeTable.`, but not sure why...

Cheers, Patrick ________________________________________ From: Thiede, Christoph Sent: Thursday, June 1, 2023 8:46:45 PM To: squeak-dev(a)lists.squeakfoundation.org; Rein, Patrick Subject: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Patrick, hi all,

even though I called myself for making SOH a separator, it's now falling on my feet:

'Hi! <img src="code://MenuIcons squeakIcon">' asTextFromHtml withBlanksTrimmed becomes 'Hi!'. The image is removed from the text because it is represented as a SOH in the string.

Is this really what we want? See Collections-ct.1039 where I started adding an extra check after sending #isSeparator to work around this. I am no experiencing further issues like that in Squeak Inbox Talk ...

Unfortunately, I cannot reproduce the issue in Shout with not treating SOH as a separator (shame on me for not documenting the issue more precisely). Patrick, can you remember?

How should we fix that issue? Do we really need SOH to be a separator? I would not really consider it satisfying to leave it up to clients to check for #isSeparator AND for SOH. Would it be appropriate to change #withBlanksTrimmed et al. on Text? Then, myText first isSeparator would still be confusing. Do we have to have multiple variants of separators as discussed earlier?

Best, Christoph

Sent from Squeak Inbox Talk

On 2021-09-06T17:11:05+00:00, christoph.thiede(a)student.hpi.uni-potsdam.de wrote:

...
Thank you for the fast follow-up! :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von David T. Lewis <lewis(a)mail.msen.com> Gesendet: Montag, 6. September 2021 18:58:25 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Hi Christoph,

Done.

I'll note that this fixes a failing test StringTest>>testWithBlanksTrimmed (which I should have noticed when applying the changes in the first place).

Thank you,

Dave

On Mon, Sep 06, 2021 at 10:55:10AM +0000, Thiede, Christoph wrote:

...
Hi Dave,

would you (or someone else) mind merging Collections-ct.956 which flushes the caches in CharacterSet? At the moment, CharacterSet separators still does not contain the SOH character. :-)

Best,

Christoph

Von: Squeak-dev <squeak-dev-bounces(a)lists.squeakfoundation.org> im Auftrag von Thiede, Christoph Gesendet: Montag, 30. August 2021 11:30:47 An: squeak-dev(a)lists.squeakfoundation.org; lewis(a)mail.msen.com Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz

Great, thank you, Dave! :-)

Best, Christoph

Sent from Squeak Inbox Talkhttps://github.com/hpi-swa-lab/squeak-inbox-talk

On 2021-08-26T17:10:40-04:00, lewis(a)mail.msen.com wrote:

...
Done. Updated in Collections-dtl.954.

Dave

On Sat, Aug 21, 2021 at 02:34:52PM -0400, David T. Lewis wrote:

...
Hi Christoph,

I just tried this again, but it results in a new test failure for CharacterSetTest>>testIntersectionOfLazy. I'm not sure I understand the implications, but I am attaching the change in case someone wants to have a look at it.

BTW, it's very nice reviewing issues like this in your Squeak inbox Talk utility :-)

Dave

On Thu, Aug 19, 2021 at 06:11:14PM +0000, Thiede, Christoph wrote: > Hi all, > > two years later, I still would love to see Patrick's proposal being accepted in the Trunk. > > My concrete problem with SOH (start of header) not being in Character separators is that text anchors in Smalltalk source code currently mix up the Shout styler, which is due to the send to CharacterSet nonSeparators from SHParserST80 scanWhitespace. Now one might argue that we could introduce a separate CharacterSet notAtAllSeparators/nonUnicodeSeparator autc. (which would also exclude SOH), but I would rather dislike this proposal because it would force us to maintain multiple different definitions of the term "character" and increase the overall domain complexity. I can't see what would be wrong with treating all character instances according to the Unicode standard (as other frameworks such as .NET seem to do, too). > > I have been using Dave's version of Character separators from above for the latest months and I did not experience any unintended side effects of the change. > > Could we please integrate Patrick's change, or are there any major objections? It would be great to get this kind of stuff working in Babylonian & Co. :-) > > Best, > Christoph > > PS: See also: http://forum.world.st/ENH-isSeparator-td5129517.html > > ________________________________ > Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Thiede, Christoph > Gesendet: Montag, 21. Juni 2021 15:47:49 > An: The general-purpose Squeak developers list > Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz > > > > It is! but do we really need a collect for this static list? > > > We could put that code in a comment and just return the resulting string? > > > My suggestion would be > > http://www.hpi.de/ > > ^ Separators ifNil: [Separators := self allCharacters select: [:ea | ea isSeparator]] > > Then we won't need to duplicate the logic. > > Best, > Christoph > ________________________________ > Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Tobias Pape <Das.Linux at gmx.de> > Gesendet: Montag, 21. Juni 2021 12:37:13 > An: The general-purpose Squeak developers list > Betreff: Re: [squeak-dev] The Trunk: Collections-pre.857.mcz > > Hi Dave > > > > On 4. Oct 2019, at 15:26, David T. Lewis <lewis at mail.msen.com> wrote: > > > > +1 > > > > This sounds like a good approach to me. If I understand correctly, it > > amounts to this: > > > > Character class>>separators > > "Answer a collection of the standard ASCII separator characters." > > > > ^ #(32 "space" > > 13 "cr" > > 9 "tab" > > 10 "line feed" > > 12 "form feed" > > 1 "text separator") > > collect: [:v | Character value: v] as: String > > > > This seems simple and clear to me. > > It is! but do we really need a collect for this static list? > We could put that code in a comment and just return the resulting string? > > Best regards > -Tobias > > > > > There are a lot of senders of #separators in the image, so it is > > possible that it might have some unintended side effect. But that > > seems unlikely. > > > > Dave > > > > > > On Fri, Oct 04, 2019 at 01:01:30PM +0200, patrick.rein at hpi.uni-potsdam.de wrote: > >> Hi everyone, > >> > >> in the context of the new text anchor layouting infrastructure there is still one thing missing. Currently start of header is not included in Character class>>#separators. This leads to problems with text editing Morphs. As start of header is not printed at all (not even as white space) I would rather classify it as a separator and add it to the list in Character class>>#separators and to Character>>#isSeparator. The advantage of this approach is that we would not need any special case in the text editing morphs. The disadvantage is that the list of separators will be less obvious to understand. > >> > >> Any thoughts about this? > >> > >> Bests > >> Patrick > >> > >> P.S.: Conceptually start of header (or heading) is a control character. ietf says: " A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading." > >> (https://tools.ietf.org/html/rfc20#section-5.2) > >> > >>> Patrick Rein uploaded a new version of Collections to project The Trunk: > >>> http://source.squeak.org/trunk/Collections-pre.857.mcz > >>> > >>> ==================== Summary ==================== > >>> > >>> Name: Collections-pre.857 > >>> Author: pre > >>> Time: 4 October 2019, 11:04:30.363303 am > >>> UUID: 5ef00b65-3884-c445-b276-0cc01f0b10a1 > >>> Ancestors: Collections-pre.856 > >>> > >>> Adds startOfHeader to Character, adds empty abstract implementations of scanFrom:, writeScanOn: to TextAttribute to allow for Texts which include TextAttributes which do not implement serialization to still be serialized, adds a comment to these methods. > >>> > >>> =============== Diff against Collections-pre.856 =============== > >>> > >>> Item was added: > >>> + ----- Method: Character class>>startOfHeader (in category 'accessing untypeable characters') ----- > >>> + startOfHeader > >>> + > >>> + ^ self value: 1 ! > >>> > >>> Item was added: > >>> + ----- Method: TextAttribute class>>scanFrom: (in category 'fileIn/Out') ----- > >>> + scanFrom: strm > >>> + "Read the text attribute properties from the stream. When this method has > >>> + been called the concrete TextAttribute class has already been selected via > >>> + scanCharacter. (see TextAttribute class>>#newFrom:). > >>> + For writing the format see TextAttribute>>#writeScanOn:"! > >>> > >>> Item was added: > >>> + ----- Method: TextAttribute>>writeScanOn: (in category 'fileIn/fileOut') ----- > >>> + writeScanOn: strm > >>> + "Implement this method for a text attribute to define how it it should be written > >>> + to a serialized form of a text object. The form should correspond to the source > >>> + file format, i.e. use a scan character to denote its subclass. > >>> + As TextAttributes are stored in RunArrays, this method is mostly called from RunArray>>#write scan. > >>> + For reading the written information see TextAttribute class>>#scanFrom:" > >>> + > >>> + "Do nothing because of abstract class"! > >>> > >>> > >> > > > > >

>

...
'From Squeak6.0alpha of 13 August 2021 [latest update: #20601] on 21 August 2021 at 2:25:03 pm'!!Character class methodsFor: 'instance creation' stamp: 'dtl 8/21/2021 14:16'!separators "Answer a collection of the standard ASCII separator characters." ^ { Character value: 32. "space" Character value: 13. "cr" Character value: 9. "tab" Character value: 10. "line feed" Character value: 12. "form feed" Character value: 1. "start of heading" } as: String! !

...

170

Age (days ago)

1667

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

18 comments

8 participants

tags (0)

participants (8)

christoph.thiede＠student.hpi.uni-potsdam.de
commits＠source.squeak.org
David T. Lewis
Levente Uzonyi
patrick.rein＠hpi.uni-potsdam.de
Rein, Patrick
Thiede, Christoph
Tobias Pape