[UPDATES] 3.8 gamma and 1 update

Hannes Hirzel hirzel at spw.unizh.ch
Fri Dec 3 00:10:37 UTC 2004


Yoshiki,

Thank you for your long an detailed answer.

Yoshiki Ohshima wrote:
>   Hannes,
>
>>>A code snippet like
>>
>>s _ FileDirectory default readOnlyFileNamed: 'utf8test.txt'.
>>s  converter: UTF8TextConverter new.
>>s contents.
>>
>>doesn't  seem to work.
>
>
>   Can you send me the utf8test.txt file?
>


Here is the complete test case. Writing the test file is fine. Reading it
back gives a different MultiString.




"================================================"
" A: Writing an UFT8 file                        "
"    The following code works fine on 3.8g-6485  "
"================================================"


(FileDirectory default directoryExists: 'utf8-test')
ifFalse:
[FileDirectory default createDirectory: 'utf8-test'.].


myTestString := 'abc ', 945 asCharacter asString,  "Greek alpha"
                        946 asCharacter asString,  "Greek beta"
                        947 asCharacter asString.  "Greek gamma".

myTestString inspect.

fileName := 'utf8-test',(FileDirectory pathNameDelimiter) asString,
'test1.txt'.
file := FileStream fileNamed: fileName.
file reset.
file converter: UTF8TextConverter new.
file nextPutAll: myTestString.
file close.



"===================================================================="
" B: Reading back the test string does not give the correct result
"
"===================================================================="

fileName := 'utf8-test',(FileDirectory pathNameDelimiter) asString,
'test1.txt'.
s _ FileDirectory default readOnlyFileNamed: fileName.
s  converter: UTF8TextConverter new.
myTestStringReadBack _ s contents.

myTestStringReadBack inspect.
myTestStringReadBack = myTestString     "evaluates to      false"
                                        "but should give true"

"This is independent of any font question."


>
>>It is difficult to check as well as the default
>>font does not show simple Unicode chars like Greek letters.
>
>
>   I thought having font only doesn't make too much sense, but I might
> be wrong.

For Greek and IPA just having the font as is (no diacritics and further
rendering processing) would be useful already in quite some cases.



>
>>MultistringCharacters show up with their internal encoding as value
(like
>>1069548123).
>
>
>   I'm not sure what are MultistringCharacters, but if it means that
> you inspect a MultiString and select one of slots, showing the value
> is the expected behavior.  Strings do that.
>

Sorry, I meant the Characters in a MultiString (which are actually
SmallIntegers ).



>
>>The whole thing doesn't look like it is gamma...

....

>   At least, the Japanese stuff works if you load one "japanese package
> (sar)".  A little support code for other languages should make it
> work for the language.


That is a huge achievement in fact. It means that the latin1 oriented and
the Japanese using Squeakers can now share a common code base. And I
agree that this _is_ a milestone which deserves it's own version number.


>   Also, the Latin-1 part works ok.  It doesn't make too much sense to
> revert the change.

That's fine. I didn't think reverting actually. Contrariwise: I just was
thinking of adding further fixes so that some straightforward non-latin1
strings work.

You added the symbol font with the fix 6477 which made that Squeak
doesn't crash anymore when evaluating snippet (A). So this allows
to write UFT8 strings in the example (A) above now.

But of course in I would like to read them back as well ;-)

I understand that you use a more elaborate encoding (a tag for the
encoding plus the code). It seams that each character (besides ASCII)
might have its own encoding.

Perhaps there is an error in my understanding when writing code snippet
(B).

>
>>Is the 3.8 image intended to be useful as is (and if so how) or is the
>>idea that people should just switch to 3.9 anyhow?

I understand that it is working  for Japanese (non-UTF8 encoding) and this
is enough of justification for having a 3.8 release given the fact that
the other latin-1 things still work (which in fact meant quite som fixes
for many people).



> It is somewhat useful pretty as it is for certain languages.
>
> You have to be aware that "supporting Unicode" is very different
> concept from "supporting language X".

I am ware of this. However what I just would like to see is having the
first 2000 Unicode characters (or even just the IPA and the Greek
characters - I have the fonts from an earlier m17n package but cannot load
them) in a default font (even if it is just one point size) with no
additional rendering at all. This would be useful for my work.


> Supporting a language doesn't
> come for free.  In any system, not only in the m17n Squeak, this is
> the case and many people are spending time to build the upper layer on
> the top of the foundation.
Where is the current platform of exchange for these activities?

In any case - even if I have to wait some more time still - thank you for
your great contribution and patience.

Regards

Hannes



More information about the Squeak-dev mailing list