[squeak-dev] The Inbox: Collections-ct.851.mcz

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Fri Aug 16 12:38:08 UTC 2019


> I personally see little value in having 63 ways to write a single digit in my Smalltalk method.

But if we don't support this, we are breaking with the standards from NumberParser. Wouldn't this be inconsistent?


Christoph



Von: Levente Uzonyi
Gesendet: Freitag, 16. August, 13:41
Betreff: Re: [squeak-dev] The Inbox: Collections-ct.851.mcz
An: The general-purpose Squeak developers list


On Fri, 16 Aug 2019, Jakob Reschke wrote: > Am Fr., 16. Aug. 2019 um 01:24 Uhr schrieb Levente Uzonyi : > On Thu, 15 Aug 2019, Thiede, Christoph wrote: > > In my eyes it is a nice side effect to support other kinds of Unicode values - NumberParser does the same. > > IMO, it opens a can of worms: > - WideStrings use 4x as much memory as ByteStings, and they lack the VM support ByteStrings have, so many operations are significantly slower with them. > - WideStrings spread like plague: >         - Wrote a WideString into a stream? your stream's buffer is now a WideString. >         - Did some operation with a WideString, e.g. #,? The result is very likely a WideString. > - Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9 in there... Oh, wait, it's a WideString with a "Mathematical sans-serif digit nine". > > > Looks like the usual can of worms you get when you want to support international text. And if your regex wants both to be applied to unicode text and to find strings with any kind of number in it, then it is incomplete. :-) I guess you missed my point. You do not want to match unicode digits when you write [0-9], but the unicode character may visually appear as a regular digit, making it harder to debug your code. > In general, treating the unicode digits as digits should actually alleviate this debugging confusion where you wonder why a digit was not processed as such, shouldn't it? It depends on how you process those numbers. > > Question is: do the Smalltalk writers expect that their string, which incidentally contains '... {', (Mathematical sans-serif digit one), '} ...' (could be in part supplied by the user?), will have that sequence replaced by > the first formatting argument or do they not expect it? Also: if user input is sanitized to escape format sequences before applying further formatting on the extended text later (imaginary scenario), this sanitization must > now also support such unicode cases. I personally see little value in having 63 ways to write a single digit in my Smalltalk method. Levente > >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190816/44077ef0/attachment.html>


More information about the Squeak-dev mailing list