<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


> I personally see little value in having 63 ways to write a single digit in my Smalltalk method.<br>


<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


But if we don't support this, we are breaking with the standards from NumberParser. Wouldn't this be inconsistent?<span id="OutlookSignature">


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


<br>


<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


Christoph</div>


</span><br>


<br>


<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


Von: Levente Uzonyi<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


Gesendet: Freitag, 16. August, 13:41<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


Betreff: Re: [squeak-dev] The Inbox: Collections-ct.851.mcz<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


An: The general-purpose Squeak developers list<br>


<br>


<br>


</div>


<div dir="auto" style="direction: ltr; margin: 0; padding: 0; font-family: sans-serif; font-size: 11pt; color: black; ">


On Fri, 16 Aug 2019, Jakob Reschke wrote: > Am Fr., 16. Aug. 2019 um 01:24 Uhr schrieb Levente Uzonyi : > On Thu, 15 Aug 2019, Thiede, Christoph wrote: > > In my eyes it is a nice side effect to support other kinds of Unicode values - NumberParser does the


 same. > > IMO, it opens a can of worms: > - WideStrings use 4x as much memory as ByteStings, and they lack the VM support ByteStrings have, so many operations are significantly slower with them. > - WideStrings spread like plague: >         - Wrote a WideString


 into a stream? your stream's buffer is now a WideString. >         - Did some operation with a WideString, e.g. #,? The result is very likely a WideString. > - Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9 in there... Oh, wait, it's


 a WideString with a "Mathematical sans-serif digit nine". > > > Looks like the usual can of worms you get when you want to support international text. And if your regex wants both to be applied to unicode text and to find strings with any kind of number in


 it, then it is incomplete. :-) I guess you missed my point. You do not want to match unicode digits when you write [0-9], but the unicode character may visually appear as a regular digit, making it harder to debug your code. > In general, treating the unicode


 digits as digits should actually alleviate this debugging confusion where you wonder why a digit was not processed as such, shouldn't it? It depends on how you process those numbers. > > Question is: do the Smalltalk writers expect that their string, which


 incidentally contains '... {', (Mathematical sans-serif digit one), '} ...' (could be in part supplied by the user?), will have that sequence replaced by > the first formatting argument or do they not expect it? Also: if user input is sanitized to escape format


 sequences before applying further formatting on the extended text later (imaginary scenario), this sanitization must > now also support such unicode cases. I personally see little value in having 63 ways to write a single digit in my Smalltalk method. Levente


 > ><br>


<br>


</div>


</body>


</html>