Re: Unicode strings, benchmarks

12 Jun 2007


      Hi, Janko,
...
I also did a bit better UTF8 conversion but is only 25-80% faster that 
existing one in UTF8TextConverter.
Good!
...
Here are results in VW, Squeak with old UTF8 converter and a new one:
  VW    old	 new

english   30	 313	 248 ByteString,   pure ASCII
french    32	 323	 251 ByteString,   ISO8859-1 (Latin 1)
slovenian  48	 578	 480 TwoByteString Latin 2
russian   112	1306	 720 TwoByteString Cyrillic
chinese   107	1544	3825 TwoByteString
Notice an exceptional 10x VW performance comparing to Squeak, and they 
do all encodings in plain Smalltalk! No primitives! So how come that 
Squeak is so slow here?
Is it true that you traded the performance for
Chinese with other languages?
BTW, I can't see the difference between this and your "With
corrected table of results:".
- UTF8TextConverter wasn't written with performance in mind (as you
    can tell^^;)
  - This kind of tight loop gives 3-5 factor of performance difference
    in VW and Squeak, plus,
  - immediate representation for characters must be helping a lot.
For the OLPC, I think I will end up with writing primitives for
Squeak.  One could say that I should like the iconv library, but not
sure if that is a good idea or not...
-- Yoshiki