[squeak-dev] Re: m17n simplification questions

Andreas Raab andreas.raab at gmx.de
Thu Sep 3 04:58:04 UTC 2009


Yoshiki Ohshima wrote:
>> * HandMorph's CompositionManager: There is ImmAbstractPlatform, ImmWin32 
>> and ImmX11. Are these still in use and functional? Should we continue to 
>> support them?
> 
>   Yes, and yes.  I probably should put the plugin code up somewhere.
> The Unix VM supports (or used to, I haven't tried it in the latest).

Thanks. It would be good if we could have the plugins on squeakvm.org. 
That makes it easier to verify that this code is present and up-to-date.

>> * LanguageEnvironment converters: Is there any reason to assume that we 
>> will ever need to support any encodings other than UTF8/Unicode for the 
>> VM/image interface? Should we just get rid of all of these different 
>> converter methods and use the UTF8/Unicode conversions directly, i.e., 
>> instead of:
>>
>>    converter := LanguageEnvironment defaultFileNameConverter.
>>    squeakPathName := vmPathString convertFromWithConverter: converter.
>>
>> the code becomes:
>>
>>    squeakPathName := vmPathString utf8ToSqueak.
> 
>   For file names, in general, it is ok by now.
> 
> The complication is reading the file names in a zip file.  The name
> interpretation has to be special.  The zip files being created and had
> been created use Shift-JIS for the archive members' names (I wonder it
> is 8859-1 in Western Europe still?).  The #defaultSystemConverter
> variant should stay for this purpose. 

Correct me if I'm wrong, but the #defaultSystemConverter doesn't seem to 
be used for this. From what I can see in a trunk image, the only 
reference is in ZipArchiveMember>>refreshLocalFileHeaderTo: using 
asVmPathName which on a current UTF-8 enabled VM would always use UTF-8 
anyway. Is this currently broken?

>> * Converter classes: If the answer to the previous question is that we 
>> use UTF8/Unicode consistently, is there any reason whatsoever to keep 
>> the clipboard or keyboard interpreter classes? (we're talking a *lot* of 
>> classes here; keyboard interpreter has 15 subclasses; clipboard 
>> interpreter 12 etc).
> 
>   Only reason would be to manage the language tag for some CJK language.

Could we fold this into the UTF8 converter? I.e., if the environment is 
not language-neutral, insert the appropriate language tag?

>> * EncodedCharSet: Are any encodings other than Unicode currently in use? 
>> Do we need to explicitly support domestic CJK encodings given that we 
>> have Unicode + language tag?
> 
>   - Because Unicode doesn't offer round trip conversion from/to some
>     of these encodings, one stance Squeak's m17n is alluding to and
>     some other systems, like Ruby m17n and Gauche Scheme's mechanism
>     try to do is to allow non-Unicode encoded chars stored in a
>     similar manner we did with language tag and ensure the input and
>     output of these strings consistent.  I would kind of like to keep
>     the ability.
> 
>   - There are even Etoys project created from old days, that use JIS X
>     0208.  If in the future to allow to load them into a possible
>     Etoys on mainstream Squeak, we probably would rather keep them.

Fair enough. I'll leave it alone.

Thanks for the help!

Cheers,
   - Andreas




More information about the Squeak-dev mailing list