[etoys-dev] OT: Unicode vs ISO-2022-JP (was Re: Fonts)

Tue Aug 4 18:49:42 EDT 2009

On Mon, Aug 3, 2009 at 9:25 PM, Yoshiki Ohshima<yoshiki at vpri.org> wrote:
> At Mon, 3 Aug 2009 19:20:45 -0700,
> Edward Cherlin wrote:
>>
>> >  Mine is rendering and retransmitting Japanese mixed with Hangul
>> > correctly in ISO-2022-JP-2 (defined in RFC 1554 and supports mixed
>> > Japanese and Chinese text nicely).  As Bert wrote, if you are reading
>> > it through the forums gateway, that may be the problem.
>>
>> Can you send to this list in Unicode? A lot of software doesn't
>> support ISO-2022-JP correctly in any form. It is a large and complex
>> standard, almost never implemented in full.
>
>  There are a fewer software that supports ISO-2022-JP-2 for various
> reasons, but it is still a member of ISO-2022 family, which a
> reasonable emailer should support.

That's like saying a Python 2.6 interpreter should be happy with
Python 3.0. Nobody supports all of ISO-2022. And as far as I know,
nobody outside Japan supports this particular insular national
non-standard.

http://unicode.org/faq/han_cjk.html
Q: I have heard there are problems in Japanese and other East Asian
mapping tables. Where can I find information about these problems?

"Sometimes the standard is ill defined, and each vendor has a choice
in how to implement the Unicode mapping table...Implementations of ISO
2022 encodings like ISO-2022-JP differ not only in the mapping tables
for the sub-encodings but also in the supported sets of escape
sequences and their invocation pattern."

> And, because vast majority of
> (pretty much "virtually all") email traffic in Japan use ISO-2022-JP
> (which is the standard anyway);

Not any more.

> so I'd send emails with some Japanese
> characters in ISO-2022-JP.

Regardless of whether the rest of us can read them? And regardless of
the fact that Unicode UTF-8 is the international standard?

> Anybody who wants to communicate in email
> in Japanese should use an emailer that supports ISO-2022-JP.

For shame. Anybody who wants to communicate in email in more than one
language should use Unicode UTF-8.

> And
> anybody who wants to talk about the glyph differences in Japanese and
> Korean (if "plain text" is preferred) would get better results with an
> emailer that supports ISO-2022-JP-2.

Nonsense. You can't claim that you can reliably see glyph differences
in plain text, when you don't know what fonts the recipient uses. When
you want to discuss glyphs, send pictures.

All right, now I claim that *you* are an unwitting member of the
cultural conspiracy of Japanese Supremacists, for insisting on
national standards over the international standard in an international
discussion. You have been misled by the Japanese Ultranationalist echo
chamber.

>> How widely is ISO-2022-JP-2 implemented? I have never heard of it
>> before. Certainly Firefox does not support it separately from
>> ISO-2022-JP.
>
>  It doesn't have to be separated entry for ISO-2022-JP and
> ISO-2022-JP-2.

Nonsense. They are _different_.

> If you save my email as it is to a file, open the file
> with Firefox by specifying file://... and change the encoding to
> ISO-2022-JP, Firefox certainly display it correctly.

Absolutely not. I tried it. And it is completely foolish of you to
claim such a thing without evidence.

>> > But you know that there is discrepancy between
>> > Unicode claim and practice.  Like the round-trip conversion guarantee,
>> > when the Unicode consortium cannot provide a standard mapping table and
>> > the claim is false.

I have sources that say otherwise. What is your source?

http://www.unicode.org/faq/han_cjk.html
Q: I have heard that UTF-8 does not support some Japanese characters.
Is this correct?

A: There is a lot of misinformation floating around about the support
of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard
supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X
0221, or JIS X 0213, for example, and many more. This is true no
matter which encoding form of Unicode is used: UTF-8, UTF-16, or
UTF-32. Unicode supports over 70,000 CJK characters right now, and
work is underway to encode further additions.

http://en.wikipedia.org/wiki/JIS_X_0208#ISO.2FIEC_10646_and_Unicode
ISO/IEC 10646 and Unicode

The kanji set of JIS X 0208 is among the original source standards for
the Han unification in ISO/IEC 10646 (UCS) and Unicode. Every kanji in
JIS X 0208 corresponds to its own code point in UCS/Unicode’s Basic
Multilingual Plane (BMP).

The non-kanji in JIS X 0208 also correspond to their own code points
in the BMP.

>> The round-trip conversion guarantee does not include all prior
>> standards. There is a list. You would have to provide specifics (which
>> we could better discuss offline) for me to comment on the details.
>
>  Hmm.  JIS X 0208 was the national standard and predates Unicode.

National, yes, standard, not exactly.

http://en.wikipedia.org/wiki/JIS_X_0208
"Even though there are provisions in the JIS X 0208:1997 standard
concerning compatibility, at the present time, it is generally
considered that this standard neither certifies compatibility nor is
it an official manufacturing standard that amounts to a declaration of
self-compatibility.[1] Consequently, de facto, JIS X 0208-“compatible”
products are not considered to exist. Terminology such as “conformant”
(準拠?) and “corresponding” (対応?) is included in JIS X 0208, but the
semantics of these terms vary from person to person."

http://ja.wikipedia.org/wiki/JIS_X_0208
かつてはJIS X 0208:1997の規格票には適合性について規定されているにもかかわらず、この規格は適合性認証または自己適合宣言の対象となる製品規格ではないと考えられていた[1]。だが2009年現在では経済産業省および日本工業標準調査会が「国がＪＩＳマーク表示制度の対象となる商品等を限定する指定商品制を廃止し、認証可能なＪＩＳ製品規格がある製品が対象となります」と明言している[2]ため、適合性について規定のある
JIS X 0208:1997 も適合性認証または自己適合宣言の対象となると考えられる。

>> >  But anyway, the discussion here is whether you can tell the

CJK

>> > languages supported by a font by looking at its name or not.  And
>> > answer is no.

This turns out not to be the case in the case of Chinese, Japanese,
and Korean that I was talking about. Actually, it is Yes on Linux and
Mac, No only on Windows.

>> True for Windows. I blame Microsoft.
>
>  "Deja-Vu" is French;

As a Unicode font of wide alphabetic coverage but no CJK, it is out of
scope for this discussion.

> the answer to the original question is no and

Answered above.

> Blaming Microsoft doesn't help there.

Why ever not? They are the ones messing us up, both with their shoddy
development practices and their illegal monopoly. The sooner Linux
takes over, the better.

> -- Yoshiki

-- 
Silent Thunder (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) is my name
And Children are my nation.
The Cosmos is my dwelling place, The Truth my destination.
http://earthtreasury.org/worknet (Edward Mokurai Cherlin)