[squeak-dev] Character variants / leadingChar / Han unification

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Jan 28 02:04:39 UTC 2017

All these ideas were floating around at least two years before, I think
with impulsion of the web guys (seaside, etc...)
Promotion of unicode and using leadingChar = 0 for unicode were suggested
several times.
I did it at least once

It's just that Andreas analysis and synthesis was brilliant!
Since he had commited a bunch of improvments in this area, I think he knew
exactly what he was taliking about

Effective replacement happened a bit later in Multilingual-nice.91 on 28
February 2010.

2017-01-27 17:05 GMT+01:00 Bert Freudenberg <bert at freudenbergs.de>:

> Thanks for the historic account, Chris!
> So we didn't replace the leadingChar mechanism, we just redefined
> "leadingChar = 0" to mean "unicode" rather than "latin1".
> The mechanism itself is still in place. It's a hack, admittedly, but as
> long as we're passing plain strings around we have no other way of
> retaining language information.
> A better way may be to support Unicode variation selectors. Then again, I
> don't know too much about that. Any native speaker to help us out?
> - Bert -
> On Thu, Jan 26, 2017 at 11:36 PM, Chris Cunningham <
> cunningham.cb at gmail.com> wrote:
>> So, back in 2009, Andreas proposed:
>> ---------------------------
>> What I would propose to do here is to define that "leadingChar = 0"
>> which currently means "Latin1 encoding, language neutral" is being
>> redefined to "Unicode encoding, language neutral". What this does is that
>> "Character value: 353" and "Unicode value: 353" become the same, if the
>> environment is considered language neutral which by default it would be.
>> ---------------------
>> In 2010, he pushed this into Squeak Trunk.
>> Then, in 2011, there was a conversation where Andreas stated:
>> -------------------
>> On 1/8/2011 2:16 AM, Sean P. DeNigris wrote:
>> #leadingChar
>> "In Squeak Character encoding, bits above 16r3FFFFF don't encode the
>> character, but hold information about the language environment and the
>> encoding which should be used to interpret the charCode. The background of
>> which is Han unification (http://en.wikipedia.org/wiki/Han_unification)."
>> How's that as a method comment?  Is it really "In Squeak... encoding..."
>> or
>> does this apply to unicode in general?
>> It is Squeak specific. Unicode does not have a leading char.
>> Cheers,
>>   - Andreas
>> ---------------------
>> Maybe this later email was the one that you were interested in?
>> I can't find any mention in the commit list or other discussions where
>> the leadingChar was dropped, but I'm not an expert in this space (just
>> interested).
>> Thanks,
>> cbc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170128/a135e099/attachment.html>

More information about the Squeak-dev mailing list