multilingual Squeak

Wed Mar 17 10:38:46 UTC 1999

I think your example shows that the question of string equality for  
strings that do not share a common encoding is often application  
specific.  A generic solution that would cover many case would be for  
two strings of different encodings to negotiate a common encoding  
and compare the result of converting each to the common encoding.   
The negotiation algorithms would be placed in the encoding class  
hierarchy, with the default common encoding probably being unicode.

Marcel

> From: "Michael S. Klein" <mklein at alumni.caltech.edu>
>
> What about comparing two strings?
>
> If you got two strings each with one character that is the same code  
> point in unicode, but the strings are in diferent encodings, are the  
> strings equal, or not.
>
> In other words, for example (from the first Han character in Unicode) 
>
> is U+4e00  =  G(5027) ?
> is G(5027)  = J(1676) ?
>
> U means Unicode 2.0
> G means GB 2312-80
> J means JIS X 0208-1990
>
> For the Eurocentric amongst us (most of the list),
>
> is a Unicode $r   the same as an ASCII $r  ?
> is an English 'r' the same as a French 'r' ?
>
> Still, representation is a start, independent of answering the
> above questions.  This way you could have an object that was a
> Unicode $r ~= ASCII $r  (Even though Unicode says they are the same,  
> semanticly.
>
> In defence of Unicode, it preserves round trip transcoding.
> It would be my standard of choice.