<div id="__MailbirdStyleContent" style="font-size: 10pt;font-family: Arial;color: #000000;text-align: left" dir="ltr">

                                        This reminds me of our #asNumber (or number parser) discussion where we agreed to not parse number-like appearances in Unicode to Integer. :-)<div><br></div><div>Instead of modifying CharacterSet etc., one could maybe extend TextConverter to support encoding-aware identification of separators etc and also provide encoding-aware #trim.</div><div><br></div><div>Best,</div><div>Marcel</div><div class="mb_sig"></div>

                                        <blockquote class="history_container" type="cite" style="border-left-style: solid;border-width: 1px;margin-top: 20px;margin-left: 0px;padding-left: 10px;min-width: 500px">

                        <p style="color: #AAAAAA; margin-top: 10px;">Am 08.05.2021 04:12:21 schrieb Levente Uzonyi <leves@caesar.elte.hu>:</p><div style="font-family:Arial,Helvetica,sans-serif">On Fri, 7 May 2021, Thiede, Christoph wrote:

<br>

<br>> 

<br>> Hi Levente,

<br>> 

<br>> 

<br>> thanks for the pointer. As far I can see from the linked discussion, Tobias' proposal has never been rejected but only postponed due to the upcoming release. I also see your point of performance, but IMHO correctness is more

> important than performance. If necessary, we could still hard-code the relevant code points into #isSeparator.

<br>> 

<br>> 

> > - consistency: CharacterSet separators would differ from the rest with your change set.

<br>> 

<br>> 

> Fair point, but I think we should instead fix the definitions of Character(Set) constants to respect the encoding as well ... By the way, Character alphabet and Character allCharacters also don't do this at the moment.

<br>> 

<br>> Of course, all your concerns are valid points and need to be discussed, but I would be sorry if we failed to - finally - establish current standards in our Character library. I doubt that any modern parser for JSON or

<br>> whatever would treat Unicode space characters incorrectly, and still, they are satisfyingly fast. I think we should be able to keep pace with them in Squeak as well. :-)

<br>

Well, you ignored my question "What is a separator?".

<br>IMO a separator is a whitespace that separates tokens in the source 

code.

Would you like to use zero-width space as a separator? Not likely.

<br>#isSeparator is deeply buried into the system. Changing it would mean 

changing other code your changeset doesn't touch, e.g. the parsers.

<br>

<br>The method you propose is welcome, but IMO it shouldn't be called 

#isSeparator. #isWhitespace is a much better fit.

<br>

<br>

<br>Levente

<br>

<br>> 

<br>> Best,

<br>> Christoph

<br>> 

<br>> _________________________________________________________________________________________________________________________________________________________________________________________________________________________________

<br>> Von: Squeak-dev <squeak-dev-bounces@lists.squeakfoundation.org> im Auftrag von Levente Uzonyi <leves@caesar.elte.hu>

<br>> Gesendet: Freitag, 7. Mai 2021 22:01:18

<br>> An: The general-purpose Squeak developers list

<br>> Betreff: Re: [squeak-dev] [ENH] isSeparator  

<br>> Hi Christoph,

<br>> 

<br>> There was a discussion on this subject before:

<br>> http://forum.world.st/The-Trunk-Collections-topa-806-mcz-td5084658.html

<br>> Main concerns are

> - definition: What is a separator?

<br>> - consistency: CharacterSet separators would differ from the rest with

> your change set.

<br>> - performance: I haven't measured it, but I wouldn't be surprised if

> #isSeparator would become a magnitude slower with that implementation.

<br>> 

<br>> 

<br>> Levente

<br>> 

<br>> On Thu, 6 May 2021, christoph.thiede@student.hpi.uni-potsdam.de wrote:

<br>> 

<br>> > Hi all,

<br>> >

<br>> > here is one tiny changeset for you: isSeparator.cs adds proper encoding-aware support for testing of separator chars. As opposed to the former implementation, non-ASCII characters such as the no-break space (U+00A0) will be

> identified correctly now, too.

<br>> >

<br>> > Please review and merge! :-)

<br>> >

<br>> > Best,

<br>> > Christoph

<br>> >

<br>> > ["isSeparator.cs.gz"]

<br>> 

<br>> 

<br>><br></leves@caesar.elte.hu></squeak-dev-bounces@lists.squeakfoundation.org></div></blockquote></div>