I do not know Han-Unification part - in fact Hanja, the chinese letter or alphabet is not included when I say Korean; only hangul, the korean alphabet/letter I say. This does have dedicated region.
----- Original Message ----- From: Philippe Marschall philippe.marschall@gmail.com To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Sent: 08-01-05 20:38:40 Subject: Re: Re: [Q] WAListener and WAFileLibrary problem
2008/1/5, chunsj@embian.com chunsj@embian.com:
Ah, I've changed/added support for UnicodeEnvironment so that UTF-8 encoded byte array be converted to/from squeak's internal encoding. With this, I can read UTF-8 encoded text(which can include korean or other languages encoded as UTF-8) from squeak environment like file list.
Language tag is not required because unicode does already has region for korea, japanese or chinese or any other languages supported by unicode. So we can determine from byte value sequence, in what language region does this byte sequence matches.
Uhm no. Unicode does Han-Unification. So for some byte sequences there is no way of telling whether they're Chinese, Japanese or Korean.
Cheers Philippe
Anyway I'm currently finding ways for determining content-type of WAResponse, so that if it's not text/html UTF8Stream be not used.
Thank you.
----- Original Message ----- From: Philippe Marschall philippe.marschall@gmail.com To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Sent: 08-01-05 15:14:48 Subject: Re: [Q] WAListener and WAFileLibrary problem
2008/1/5, chunsj@embian.com chunsj@embian.com:
I've found the main reason of image corruption; that's because WAListenerEncoded does use UTF8Stream *unconditionally* as you said it does not decide based on mime type.
But I cannot understand why Korean as UTF8 should not work.
Because WAListenerEncoded encoded gives you Strings in Squeak encoding but also expects Strings from you to be in Squeak encoding. If you pass to it Strings that are already in UTF8 they get converted twice to UTF8.
My image is cutomized by me so that it does support Korean and others(Japanese and Chinese but no font for these 2). WideString for korean can be fawlessly converted to/from UTF8 encoded byte string.
No, not at all. UTF8 has no concept of language tags.
Chees Philippe
Is this the work be done by WAListenerEncoded?
Thank you for your help. Now I'm trying to find content-type of WAResponse before using UTF8Stream.
----- Original Message ----- From: Philippe Marschall philippe.marschall@gmail.com To: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Sent: 08-01-04 20:40:57 Subject: Re: [Q] WAListener and WAFileLibrary problem
2008/1/3, chunsj@embian.com chunsj@embian.com:
Hi,
I've managed to find and modify WAListenerEncoded so that it can process multibyte language - I've only tested it with Korean as UTF-8. During testing I found following problem.
Korean as UTF-8 should not work on WAListenerEncoded. If it does then it's a bug in WAListenerEncoded. The reason for this is that Korean as UTF-8 violates the contract between the server adapter and you. The *Encoded* adapters give you Strings in Squeak encoding (well not quite in the case of CJK because that is not possible since Unicode does not have the concept of language tags) but in turn expect Strings in Squeak encoding. In the case of Korean this means WideStrings. UTF-8 Strings are ByteStrings and should therefore not work.
When I use WAListener/WAListenerEncoded I cannot get FileLibrary registered image files correctly. I can get CSS file or script file correctly, but I cannot get image files.
I don't think WAListenerEncoded can ever work for binary files. The problem is that due to it's streaming nature WAListenerEncoded compared to WAKomEncoded can never look at the response. This means it can never decide wehter is should do encoding (based on the mimetype), so it always does it. In the case of binary content this is clearly wrong. Your best option (as always) is to serve static files (images, CSS, javascript) with Apache or something similar.
It seems that when I use WAListener, the server sent the image file of the size of 16135 byte, but original file size is 10819 byte, and this might be the source of the problem. I cannot open wrong sized file even though I cut the size of the file to the original one.
WAListener should not do any encoding at all so images should work. But then again we don't know what code you changed so we can't really help you. It would help if you send us the image so we can test.
Cheers Philippe
squeak-dev@lists.squeakfoundation.org