[Q] WAListener and WAFileLibrary problem

Philippe Marschall philippe.marschall at gmail.com
Sun Jan 6 16:28:49 UTC 2008


So how do you know whether the utf-8 byte sequence 0xE4 0xB8 0x8E
(U+4E0E) is generic Chinese, traditional Chinese, simplified Chinese,
Japanese or Korean?

Cheers
Philippe

2008/1/5, chunsj at embian.com <chunsj at embian.com>:
> I do not know Han-Unification part - in fact Hanja, the chinese letter or alphabet is not
> included when I say Korean; only hangul, the korean alphabet/letter I say. This does have
> dedicated region.
>
> ----- Original Message -----
>    From: Philippe Marschall <philippe.marschall at gmail.com>
>    To: The general-purpose Squeak developers list <squeak-dev at lists.squeakfoundation.org>
>    Sent: 08-01-05 20:38:40
>    Subject: Re: Re: [Q] WAListener and WAFileLibrary problem
>
>   2008/1/5, chunsj at embian.com <chunsj at embian.com>:
> > Ah, I've changed/added support for UnicodeEnvironment so that UTF-8
> > encoded byte array be converted to/from squeak's internal encoding.
> > With this, I can read UTF-8 encoded text(which can include korean or
> > other languages encoded as UTF-8) from squeak environment like
> > file list.
> >
> > Language tag is not required because unicode does already has region for
> > korea, japanese or chinese or any other languages supported by unicode.
> > So we can determine from byte value sequence, in what language region
> > does this byte sequence matches.
>
> Uhm no. Unicode does Han-Unification. So for some byte sequences there
> is no way of telling whether they're Chinese, Japanese or Korean.
>
> Cheers
> Philippe
>
> > Anyway I'm currently finding ways for determining content-type of WAResponse,
> > so that if it's not text/html UTF8Stream be not used.
> >
> > Thank you.
> >
> > ----- Original Message -----
> >    From: Philippe Marschall <philippe.marschall at gmail.com>
> >    To: The general-purpose Squeak developers list <squeak-dev at lists.squeakfoundation.org>
> >    Sent: 08-01-05 15:14:48
> >    Subject: Re: [Q] WAListener and WAFileLibrary problem
> >
> >   2008/1/5, chunsj at embian.com <chunsj at embian.com>:
> > > I've found the main reason of image corruption; that's because WAListenerEncoded
> > > does use UTF8Stream *unconditionally* as you said it does not decide based on mime
> > > type.
> > >
> > > But I cannot understand why Korean as UTF8 should not work.
> >
> > Because WAListenerEncoded encoded gives you Strings in Squeak encoding
> > but also expects Strings from you to be in Squeak encoding. If you
> > pass to it Strings that are already in UTF8 they get converted twice
> > to UTF8.
> >
> > > My image is cutomized by me
> > > so that it does support Korean and others(Japanese and Chinese but no font for these 2).
> > > WideString for korean can be fawlessly converted to/from UTF8 encoded byte string.
> >
> > No, not at all. UTF8 has no concept of language tags.
> >
> > Chees
> > Philippe
> >
> > > Is this
> > > the work be done by WAListenerEncoded?
> > >
> > > Thank you for your help. Now I'm trying to find content-type of WAResponse before using
> > > UTF8Stream.
> > >
> > > ----- Original Message -----
> > >    From: Philippe Marschall <philippe.marschall at gmail.com>
> > >    To: The general-purpose Squeak developers list <squeak-dev at lists.squeakfoundation.org>
> > >    Sent: 08-01-04 20:40:57
> > >    Subject: Re: [Q] WAListener and WAFileLibrary problem
> > >
> > >   2008/1/3, chunsj at embian.com <chunsj at embian.com>:
> > > > Hi,
> > > >
> > > > I've managed to find and modify WAListenerEncoded so that it can process
> > > > multibyte language - I've only tested it with Korean as UTF-8. During testing
> > > > I found following problem.
> > >
> > > Korean as UTF-8 should not work on WAListenerEncoded. If it does then
> > > it's a bug in WAListenerEncoded. The reason for this is that Korean as
> > > UTF-8 violates the contract between the server adapter and you. The
> > > *Encoded* adapters give you Strings in Squeak encoding (well not quite
> > > in the case of CJK because that is not possible since Unicode does not
> > > have the concept of language tags) but in turn expect Strings in
> > > Squeak encoding. In the case of Korean this means WideStrings. UTF-8
> > > Strings are ByteStrings and should therefore not work.
> > >
> > > > When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered
> > > > image files correctly. I can get CSS file or script file correctly, but I cannot get
> > > > image files.
> > >
> > > I don't think WAListenerEncoded can ever work for binary files. The
> > > problem is that due to it's streaming nature WAListenerEncoded
> > > compared to WAKomEncoded can never look at the response. This means it
> > > can never decide wehter is should do encoding (based on the mimetype),
> > > so it always does it. In the case of binary content this is clearly
> > > wrong. Your best option (as always) is to serve static files (images,
> > > CSS, javascript) with Apache or something similar.
> > >
> > > > It seems that when I use WAListener, the server sent the image file of the size
> > > > of 16135 byte, but original file size is 10819 byte, and this might be the source
> > > > of the problem. I cannot open wrong sized file even though I cut the size of the
> > > > file to the original one.
> > >
> > > WAListener should not do any encoding at all so images should work.
> > > But then again we don't know what code you changed so we can't really
> > > help you. It would help if you send us the image so we can test.
> > >
> > > Cheers
> > > Philippe
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>



More information about the Squeak-dev mailing list