[squeak-dev] leadingChar proposal

Philippe Marschall philippe.marschall at gmail.com
Fri Aug 28 06:08:16 UTC 2009


2009/8/28 Andreas Raab <andreas.raab at gmx.de>:
> Folks -
>
> I think it's time to do something about the leadingChar in Characters that
> has been on the TODO list for a while. I have been looking over this stuff
> for some time now, fixing things here and there and laying some of the
> ground work for the things to come.
>
> Here is the good news: Squeak doesn't need the leadingChar any longer. If
> you are running an updated trunk image you can run entirely without the
> leadingChar being used, and I've done this for about a week now with no ill
> side effects (disclaimer: I haven't been using very much of m17n support
> stuff so there may still be breakage but it means it won't explode in your
> face straightaway). If you would like to try yourself, all you need to do is
> to hack Character>>setValue: to say, e.g.,
>
>        value := newValue bitClear: 16r3FC00000.
>
> and you're good (and won't ever see a leadingChar). However, the removal of
> the leading char could be used to do a couple of other things that I would
> like to discuss and solicit feedback in particular from the folks who care
> about the leadingChar.
>
> The main insight is that although we *can* run without the leadingChar, it
> doesn't mean we *have* to. As it stands, the leading char is used for two
> purposes: Character set selection (EncodedCharSet) and (parts of) language
> support. There is a significant amount of confusion between the two with
> Latin1/Latin2Environment subclasses of LanguageEnvironment (although these
> are character encodings not languagse).
>
> What I would propose to do here is to define that "leadingChar = 0" which
> currently means "Latin1 encoding, language neutral" is being redefined to
> "Unicode encoding, language neutral". What this does is that "Character
> value: 353" and "Unicode value: 353" become the same, if the environment is
> considered language neutral which by default it would be.
>
> All but the environment which care about the connotations of the language
> tag should be able to work with this definition without any change
> whatsovever. The only thing that changes is that the default
> LanguageEnvironment is Unicode based, using leadingChar=0, most of the
> subclasses go away (being replaced by the default LanguageEnvironment) and
> those that we care about, or need a transition plan (i.e., the CJK
> languages) we keep using the language tag for the time being.
>
> That means that *if* you set your language environment to be one of the CJK
> languages you get a language tag in your strings, but by default the
> language neutral environment will produce "plain Unicode". Which should make
> the server/seaside/aida people a lot more happy when dealing with this
> stuff.
>
> For the CJK languages (or other languages requiring support that has been so
> far expressed via the languag tag) we can use this opportunity and phase the
> use of the language tag out in favor of using text attributes (which would
> have to be written first).
>
> The main advantage of the proposal is that the people who would like to use
> plain Unicode get to use it, and the people who care about the language tag
> and its consequences can still use that as well.
>
> How does that sound?

Like good news.

Cheers
Philippe



More information about the Squeak-dev mailing list