Hi, Janko,
I also did a bit better UTF8 conversion but is only 25-80% faster that existing one in UTF8TextConverter.
Good!
Here are results in VW, Squeak with old UTF8 converter and a new one:
VW old new
english 30 313 248 ByteString, pure ASCII french 32 323 251 ByteString, ISO8859-1 (Latin 1) slovenian 48 578 480 TwoByteString Latin 2 russian 112 1306 720 TwoByteString Cyrillic chinese 107 1544 3825 TwoByteString
Notice an exceptional 10x VW performance comparing to Squeak, and they do all encodings in plain Smalltalk! No primitives! So how come that Squeak is so slow here?
Is it true that you traded the performance for Chinese with other languages?
BTW, I can't see the difference between this and your "With corrected table of results:".
- UTF8TextConverter wasn't written with performance in mind (as you can tell^^;) - This kind of tight loop gives 3-5 factor of performance difference in VW and Squeak, plus, - immediate representation for characters must be helping a lot.
For the OLPC, I think I will end up with writing primitives for Squeak. One could say that I should like the iconv library, but not sure if that is a good idea or not...
-- Yoshiki