UTF8 as default text encoding? (was: Re: m17n ready to go)

Hannes Hirzel hirzel at spw.unizh.ch
Fri Jul 30 14:21:48 UTC 2004


Ned Konz wrote:
> On Thursday 29 July 2004 2:14 pm, Yoshiki Ohshima wrote:
>
>>>>Again, the default assumption is that the String will hold text --
even
>>>>though there's nothing in it yet! It seems to me that the default
>>>>converter for this stream should be the Latin1TextConverter. If a
>>>>particular user of a String has a need for or knowledge of a
particular
>>>>encoding, they can change the converter.
>>
>>  No.  If the default is Latin1TextConverter, there would be more
>>problems.
>
>
> Like what? If everyone who wants text is specifying the type (like you
suggest
> below) there shouldn't be any problems.
>
>
>>>>However, I don't think it's right to introduce new  and incompatible
>>>>character conversion semantics on the existing file API.
>>
>>  The rule of thumb is that if you open a file, you should think about
>>it is text or binary, and if it is text, you should think about how
>>it is interpreted.
>
>
> Sure. And the authors of the code that was broken had done that when
they
> wrote it.
>

This boils down to the question if Latin1 or UTF8 should be the default
text encoding. If one thinks backwards Latin1 is probably the choice
whereas if we look forward UTF8 is surely to be preferred.

I personally prefer UTF8.
But perhaps this decision might be postponed?

Hannes



More information about the Squeak-dev mailing list