[squeak-dev] The Inbox: Collections-ct.851.mcz

Jakob Reschke forums.jakob at resfarm.de
Fri Aug 16 10:26:09 UTC 2019


Am Fr., 16. Aug. 2019 um 01:24 Uhr schrieb Levente Uzonyi <
leves at caesar.elte.hu>:

> On Thu, 15 Aug 2019, Thiede, Christoph wrote:
> > In my eyes it is a nice side effect to support other kinds of Unicode
> values - NumberParser does the same.
>
> IMO, it opens a can of worms:
> - WideStrings use 4x as much memory as ByteStings, and they lack the VM
> support ByteStrings have, so many operations are significantly slower with
> them.
> - WideStrings spread like plague:
>         - Wrote a WideString into a stream? your stream's buffer is now a
> WideString.
>         - Did some operation with a WideString, e.g. #,? The result is
> very likely a WideString.
> - Why doesn't this string match my regex '.*[0-9].*'? There's clearly a 9
> in there... Oh, wait, it's a WideString with a "Mathematical sans-serif
> digit nine".
>

Looks like the usual can of worms you get when you want to support
international text. And if your regex wants both to be applied to unicode
text and to find strings with any kind of number in it, then it is
incomplete. :-) In general, treating the unicode digits as digits should
actually alleviate this debugging confusion where you wonder why a digit
was not processed as such, shouldn't it?

Question is: do the Smalltalk writers expect that their string, which
incidentally contains '... {', (Mathematical sans-serif digit one), '} ...'
(could be in part supplied by the user?), will have that sequence replaced
by the first formatting argument or do they not expect it? Also: if user
input is sanitized to escape format sequences before applying further
formatting on the extended text later (imaginary scenario), this
sanitization must now also support such unicode cases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190816/18f526c1/attachment.html>


More information about the Squeak-dev mailing list