[squeak-dev] Character variants / leadingChar / Han unification

Bert Freudenberg bert at freudenbergs.de
Fri Jan 27 16:05:44 UTC 2017


Thanks for the historic account, Chris!

So we didn't replace the leadingChar mechanism, we just redefined
"leadingChar = 0" to mean "unicode" rather than "latin1".

The mechanism itself is still in place. It's a hack, admittedly, but as
long as we're passing plain strings around we have no other way of
retaining language information.

A better way may be to support Unicode variation selectors. Then again, I
don't know too much about that. Any native speaker to help us out?

- Bert -

On Thu, Jan 26, 2017 at 11:36 PM, Chris Cunningham <cunningham.cb at gmail.com>
wrote:

> So, back in 2009, Andreas proposed:
>
> ---------------------------
> What I would propose to do here is to define that "leadingChar = 0" which
> currently means "Latin1 encoding, language neutral" is being redefined to
> "Unicode encoding, language neutral". What this does is that "Character
> value: 353" and "Unicode value: 353" become the same, if the environment is
> considered language neutral which by default it would be.
> ---------------------
>
> In 2010, he pushed this into Squeak Trunk.
>
> Then, in 2011, there was a conversation where Andreas stated:
>
> -------------------
> On 1/8/2011 2:16 AM, Sean P. DeNigris wrote:
> #leadingChar
> "In Squeak Character encoding, bits above 16r3FFFFF don't encode the
> character, but hold information about the language environment and the
> encoding which should be used to interpret the charCode. The background of
> which is Han unification (http://en.wikipedia.org/wiki/Han_unification)."
>
> How's that as a method comment?  Is it really "In Squeak... encoding..." or
> does this apply to unicode in general?
>
> It is Squeak specific. Unicode does not have a leading char.
>
> Cheers,
>   - Andreas
> ---------------------
>
> Maybe this later email was the one that you were interested in?
>
> I can't find any mention in the commit list or other discussions where the
> leadingChar was dropped, but I'm not an expert in this space (just
> interested).
>
> Thanks,
> cbc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20170127/6452f179/attachment.html>


More information about the Squeak-dev mailing list