Unicode support (File names was Re: Warning: Large Babeltranslation)

Yoshiki Ohshima Yoshiki.Ohshima at acm.org
Mon Nov 17 18:48:32 UTC 2003


  Lex,

> iconv supports all four of these formats: EUC-jp, EUC-kr, Shift-JIS, and
> VISCII.  So if the maintainer uses  iconv they will feel great.  That
> web page, again, is:
> 
> 	http://www.gnu.org/software/libiconv/
> 
> I finally understand your point now about code duplication in the
> various VM's, but that can be fixed by using C libraries such
> as iconv.

  It is not that whether iconv supports those encodings or not.  It is
the burden who has to do the implementation and testing.  I don't
think the maintainers feel great if they don't know if it is *really*
working or not.  I'd rather let someone knows the matter and who cares
about do the language specific implementation and testing.

  Of course, if we start depending on a third party library to this
deep level, the portability will be affected.  The VMMaker has to
specify the iconv version and configure option, the table may disagree
with the one the OS has, and if the platform happens to have a data
structure called iconv_t, etc.

  Another important point is that we'll need the in image conversion
anyway.  Again, we don't want to use UTF-8 for the internal
representation, the internal string has to be converted before passed
to primitives.  (So, what kind of data structure do you imagine to use
as the internal representation?)

  Also, if you write a program that access a web server (hehe, you
did, actually), the code that the server returns can be anything.  You
need to convert the response from the server to the internal
representation before render it.

  You can imagine to have an interface to libiconv, but that'll
complicate the system architecture much more.  (I don't oppose to have
an optional primitives for this purpose...)

> The rest of the stuff I just don't get.  I'll stop now instead
> of speculating; maybe seeing the generality of iconv
> is enough to rest the case.

  Well, don't worry about it.  Your code won't be affected by the m17n
stuff too much.  The ASCII world in Squeak will more or less stays the
same.

-- Yoshiki



More information about the Squeak-dev mailing list