<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<meta content="text/html; charset=UTF-8">

<style type="text/css" style="">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div dir="ltr">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">

<p>Hi Levente,</p>

<p><br>

</p>

<p>thanks for the pointer. As far I can see from the linked discussion, Tobias' proposal has never been rejected but only postponed due to the upcoming release. I also see your point of performance, but IMHO correctness is more important than performance. If

 necessary, we could still hard-code the relevant code points into #isSeparator.</p>

<p><br>

</p>

<p><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">> </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">- consistency: CharacterSet separators would differ from the rest with </span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"></span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">your

 change set.</span><span style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"></p>

<div><br>

</div>

</span>

<p></p>

<div id="x_Signature">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols">

<div name="x_divtagdefaultwrapper" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:; margin:0">

<div>

<div class="x__rp_T4" id="x_Item.MessagePartBody">Fair point, but I think we should instead fix the definitions of Character(Set) constants to respect the encoding as well ... By the way, Character alphabet and Character allCharacters also don't do this at

 the moment.

<div class="x__rp_U4 x_ms-font-weight-regular x_ms-font-color-neutralDark x_rpHighlightAllClass x_rpHighlightBodyClass" id="x_Item.MessageUniqueBody" style="font-family:wf_segoe-ui_normal,"Segoe UI","Segoe WP",Tahoma,Arial,sans-serif,serif,EmojiFont">

<div dir="ltr">

<div id="x_divtagdefaultwrapper"><font face="Calibri,Helvetica,sans-serif,EmojiFont,Apple Color Emoji,Segoe UI Emoji,NotoColorEmoji,Segoe UI Symbol,Android Emoji,EmojiSymbols">

<div id="x_Signature">

<div style="margin:0px"><font style="font-family:Calibri,Arial,Helvetica,sans-serif,serif,EmojiFont">

<div><font size="3" color="black"><span style="font-size:12pt"><a href="http://www.hpi.de/" target="_blank" rel="noopener noreferrer" id="LPNoLP"><font size="2"><span id="LPlnk909538"><font color="#757B80"></font></span></font></a></span></font></div>

</font></div>

</div>

</font></div>

</div>

</div>

</div>

<div class="x__rp_T4" id="x_Item.MessagePartBody"><br>

</div>

<div class="x__rp_T4" id="x_Item.MessagePartBody">Of course, all your concerns are valid points and need to be discussed, but I would be sorry if we failed to - finally - establish current standards in our Character library. I doubt that any modern parser for

 JSON or whatever would treat Unicode space characters incorrectly, and still, they are satisfyingly fast. I think we should be able to keep pace with them in Squeak as well. :-)</div>

<div class="x__rp_T4" id="x_Item.MessagePartBody"><br>

</div>

<div class="x__rp_T4" id="x_Item.MessagePartBody">Best,</div>

<div class="x__rp_T4" id="x_Item.MessagePartBody">Christoph</div>

</div>

<div><font size="2" color="#808080"></font></div>

</div>

</div>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>Von:</b> Squeak-dev <squeak-dev-bounces@lists.squeakfoundation.org> im Auftrag von Levente Uzonyi <leves@caesar.elte.hu><br>

<b>Gesendet:</b> Freitag, 7. Mai 2021 22:01:18<br>

<b>An:</b> The general-purpose Squeak developers list<br>

<b>Betreff:</b> Re: [squeak-dev] [ENH] isSeparator</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Hi Christoph,<br>

<br>

There was a discussion on this subject before:<br>

<a href="http://forum.world.st/The-Trunk-Collections-topa-806-mcz-td5084658.html">http://forum.world.st/The-Trunk-Collections-topa-806-mcz-td5084658.html</a><br>

Main concerns are<br>

- definition: What is a separator?<br>

- consistency: CharacterSet separators would differ from the rest with <br>

your change set.<br>

- performance: I haven't measured it, but I wouldn't be surprised if <br>

#isSeparator would become a magnitude slower with that implementation.<br>

<br>

<br>

Levente<br>

<br>

On Thu, 6 May 2021, christoph.thiede@student.hpi.uni-potsdam.de wrote:<br>

<br>

> Hi all,<br>

><br>

> here is one tiny changeset for you: isSeparator.cs adds proper encoding-aware support for testing of separator chars. As opposed to the former implementation, non-ASCII characters such as the no-break space (U+00A0) will be identified correctly now, too.<br>

><br>

> Please review and merge! :-)<br>

><br>

> Best,<br>

> Christoph<br>

><br>

> ["isSeparator.cs.gz"]<br>

<br>

</div>

</span></font>

</body>

</html>