[squeak-dev] String >> utf8Encoded, ByteArray >> utf8Decoded

Eliot Miranda eliot.miranda at gmail.com
Sun Jan 28 18:52:04 UTC 2018


On Sun, Jan 28, 2018 at 10:40 AM, Tobias Pape <Das.Linux at gmx.de> wrote:

> Hi all.
>
> First, I think Tony's idea is good in terms of usability.
>

+1


>
> Second:
>
> > On 28.01.2018, at 19:06, tim Rowledge <tim at rowledge.org> wrote:
> >
> > What would be so much better is a proper UTF8String class.
> >
> > One of the problems of course is that doing almost anything to a utf
> encoded pseudostring requires complex faffing around to decode some or all
> of it. This makes them pretty much useless for anything outside passing to
> external libraries, at least so far as I have found. However, that turns
> out to be a quite important thing, and right now we have a horrible mess.
>
> I think our ByteString/WideString is already pretty good (We have complete
> unicode coverage and whatnot). If we have to improve, lets first have a
> look at the conceptual things.
> Please please have a look at:
>
>         https://manishearth.github.io/blog/2017/01/14/stop-
> ascribing-meaning-to-unicode-code-points/
>
> When we have an improved String, it should
>
>  - report size in terms of extended grapheme cluster.
>  - Never Ever Expose UTF8 bytes to users
>  - Always have UTF8encodings result in ByteArrays.
>    (because, well and utf8-encoded thing is no longer a string, it's just
> encoded byte data)
>  - Does normalization correctly…
>
> that's my 2ct
> Best regards
>         -Tobias
>
> >
> > We do, however, have a sorta-kinda model of a way to handle it in
> FilePath, which actually has nothing much to do with file paths at all.
> FilePath is a bit more general than a UTF-* string and maybe that is still
> a valuable option.
> >
> > I see two basic options for making an improvements
> >
> > a) a simplistic UTF8String that is a byte array of the utf8 bytes, does
> nothing much except exist as an encoded string to pass to primitives. #size
> returns the number of bytes, #at & #at:put: are not for general
> consumption, to do any sort of editing you have to covert it to a real
> String.
> >
> > b) something like FilePath, with both the original and encoded version
> kept, automagic conversions and some interesting hand-waving to deal with
> #size (is it the number of characters, or the number of bytes?) etc.
> >
> > tim
> > --
> > tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> > Useful random insult:- His future is behind schedule.
> >
> >
> >
>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180128/5e18c54e/attachment.html>


More information about the Squeak-dev mailing list