<div dir="ltr"><div dir="ltr">Am Fr., 16. Aug. 2019 um 01:24 Uhr schrieb Levente Uzonyi <<a href="mailto:leves@caesar.elte.hu">leves@caesar.elte.hu</a>>:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 15 Aug 2019, Thiede, Christoph wrote:<br>> In my eyes it is a nice side effect to support other kinds of Unicode values - NumberParser does the same.<br>

<br>

IMO, it opens a can of worms:<br>

- WideStrings use 4x as much memory as ByteStings, and they lack the VM support ByteStrings have, so many operations are significantly slower with them.<br>

- WideStrings spread like plague:<br>

        - Wrote a WideString into a stream? your stream's buffer is now a WideString.<br>

        - Did some operation with a WideString, e.g. #,? The result is very likely a WideString.<br>

- Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9 in there... Oh, wait, it's a WideString with a "Mathematical sans-serif digit nine".<br></blockquote><div><br></div><div>Looks like the usual can of worms you get when you want to support international text. And if your regex wants both to be applied to unicode text and to find strings with any kind of number in it, then it is incomplete. :-) In general, treating the unicode digits as digits should actually alleviate this debugging confusion where you wonder why a digit was not processed as such, shouldn't it?</div><div><br></div><div>Question is: do the Smalltalk writers expect that their string, which incidentally contains '... {', (Mathematical sans-serif digit one), '} ...' (could be in part supplied by the user?), will have that sequence replaced by the first formatting argument or do they not expect it? Also: if user input is sanitized to escape format sequences before applying further formatting on the extended text later (imaginary scenario), this sanitization must now also support such unicode cases.<br></div></div></div>