[squeak-dev] Character variants / leadingChar / Han unification

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Jan 28 02:04:39 UTC 2017


All these ideas were floating around at least two years before, I think
with impulsion of the web guys (seaside, etc...)
Promotion of unicode and using leadingChar = 0 for unicode were suggested
several times.
I did it at least once
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-March/135062.html

It's just that Andreas analysis and synthesis was brilliant!
Since he had commited a bunch of improvments in this area, I think he knew
exactly what he was taliking about

Effective replacement happened a bit later in Multilingual-nice.91 on 28
February 2010.

2017-01-27 17:05 GMT+01:00 Bert Freudenberg <bert at freudenbergs.de>:

> Thanks for the historic account, Chris!
>
> So we didn't replace the leadingChar mechanism, we just redefined
> "leadingChar = 0" to mean "unicode" rather than "latin1".
>
> The mechanism itself is still in place. It's a hack, admittedly, but as
> long as we're passing plain strings around we have no other way of
> retaining language information.
>
> A better way may be to support Unicode variation selectors. Then again, I
> don't know too much about that. Any native speaker to help us out?
>
> - Bert -
>
> On Thu, Jan 26, 2017 at 11:36 PM, Chris Cunningham <
> cunningham.cb at gmail.com> wrote:
>
>> So, back in 2009, Andreas proposed:
>>
>> ---------------------------
>> What I would propose to do here is to define that "leadingChar = 0"
>> which currently means "Latin1 encoding, language neutral" is being
>> redefined to "Unicode encoding, language neutral". What this does is that
>> "Character value: 353" and "Unicode value: 353" become the same, if the
>> environment is considered language neutral which by default it would be.
>> ---------------------
>>
>> In 2010, he pushed this into Squeak Trunk.
>>
>> Then, in 2011, there was a conversation where Andreas stated:
>>
>> -------------------
>> On 1/8/2011 2:16 AM, Sean P. DeNigris wrote:
>> #leadingChar
>> "In Squeak Character encoding, bits above 16r3FFFFF don't encode the
>> character, but hold information about the language environment and the
>> encoding which should be used to interpret the charCode. The background of
>> which is Han unification (http://en.wikipedia.org/wiki/Han_unification)."
>>
>> How's that as a method comment?  Is it really "In Squeak... encoding..."
>> or
>> does this apply to unicode in general?
>>
>> It is Squeak specific. Unicode does not have a leading char.
>>
>> Cheers,
>>   - Andreas
>> ---------------------
>>
>> Maybe this later email was the one that you were interested in?
>>
>> I can't find any mention in the commit list or other discussions where
>> the leadingChar was dropped, but I'm not an expert in this space (just
>> interested).
>>
>> Thanks,
>> cbc
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170128/a135e099/attachment.html>


More information about the Squeak-dev mailing list