[squeak-dev] Re: The future of Squeak & Pharo (was Re: [Pharo-project] [ANN] Pharo MIT license clean)

Philippe Marschall philippe.marschall at gmail.com
Mon Jun 29 19:08:55 UTC 2009


2009/6/29 Yoshiki Ohshima <yoshiki at vpri.org>:
> At Mon, 29 Jun 2009 00:38:16 -0700,
> Andreas Raab wrote:
>>
>> Yoshiki Ohshima wrote:
>> >   Yeah, but insisting to use #= to do it seems to be a wrong goal
>> > Define seasideEqual: and use it elsewhere would be better.
>>
>> You'd have to change too many places for this to be feasible. For
>> example, consider Set and Dictionary operations which may have to be
>> changed in this process.
>
>  Right.  And Phillippe can strip the leading char when he wants.

That's what we try do and it's such a pain. We'd have to do it for
every sting that we get from anywhere. At which point string handling
in Smalltalk is harder than in C and has similar pitfalls.

>> >> And we have to map characters to bytes and bytes to characters.
>> >
>> >   Everybody does.  Not sure why it is relevant.
>>
>> Because a simple conversion like squeakToUtf8/utf8ToSqueak is not
>> lossless anymore. The problem is that Unicode is Unicode is Unicode ;-)
>> You can either use it and live with its shortcomings or you can write
>> something completely different. But as soon as one tries to redefine it
>> partially it leads to problems since there are too many surrounding
>> assumptions.
>
>  From another POV, the communication with outside world was
> secondary.

No kidding, I noted.

> The .changes in in UTF-8 but has the additional
> information always so it is lossless.
>
>> >> I agree but you don't know how often I have gotten this answer when I
>> >> suggested to simply drop #leadingChar.
>> >
>> >   Maybe you get the answer often because it makes sense?  I wrote this
>> > a few times in last some years but if the goal is to make a
>> > comprehensible personal computer environment, some information is
>> > needed more than Unicode code points provides.
>>
>> Yes, but it is not clear whether that information belongs into the
>> string itself or in its surrounding context. The argument that a
>> language should be part of the string itself can certainly be made but
>> it neglects the necessity of interacting with the outside world. And in
>> the outside world (meaning Unicode) the language tag isn't part of the
>> string but rather part of the context. Consequently, I think that Squeak
>> should follow similar semantics (imperfect as that may be for some uses)
>> and have language information associated with class Text (i.e., a text
>> attribute) not with class Character (leadingChar).
>
>  I should say I've been agreeing with it for a while, but it also is
> a huge unstabilier to the existing system.  Now Pharo people may be
> able to leap the gap, if they like.

I totally see the need to know the language of text to render. But the
character is the wrong place to store it, seriously.

Cheers
Philippe



More information about the Squeak-dev mailing list