[BUG] Unicode in multilingualized Squeak

Mon Mar 24 20:51:14 UTC 2003

Dear Yoshiki,

in Unicode class>>leadingCharFor:kanjiClass: 
we have the following code fragment:

  unicode >= 162E80 ifTrue: [
   ^ self leadingCharForKanjiClass: kanjiClass
  ].

This statement is compilable but at program execution time it
causes the error  "162 does not understand: E80"
Yes,  162E80  is parsed as  162   E80  where E80 is a message
that is sent to an instance of SmallInteger.

I think the statement should read:

  unicode >= 16r2E80 ifTrue: [
   ^ self leadingCharForKanjiClass: kanjiClass
  ].

which looks understandable, 16r2E80 is the first code point of the 
CJK part of Unicode.

When my assumption is right, we have here an example of what a
programmer has to fear most: A typo that results in a strange but
valid statement.

I have also reason to believe that 
UTF8TextConverter>>nextFromStream:
should be revised. Currently no value is assigned to the
temporary variable  codesets when "Smalltalk primaryLanguage"
is not one of Japanese, Korean, ContinentalChinese.
Perhaps it is a good idea to  add

  codesets isNil ifTrue: [^Unicode value: unicode]

 but this proposal is highly speculative.

A change set is added. What do you think about it? Can you
approve my changes or did I miss somthing important?

-- Boris 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fixes2.1.cs
Type: application/octet-stream
Size: 4353 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20030324/7f598d08/fixes2.1.obj