Unicode strings, benchmarks

Mon Jun 11 22:10:40 UTC 2007

  Hi, Janko,

> I also did a bit better UTF8 conversion but is only 25-80% faster that 
> existing one in UTF8TextConverter.

  Good!

> Here are results in VW, Squeak with old UTF8 converter and a new one:
> 
> 	   VW    old	 new
> english   30	 313	 248 ByteString,   pure ASCII
> french    32	 323	 251 ByteString,   ISO8859-1 (Latin 1)
> slovenian  48	 578	 480 TwoByteString Latin 2
> russian   112	1306	 720 TwoByteString Cyrillic
> chinese   107	1544	3825 TwoByteString
> 
> Notice an exceptional 10x VW performance comparing to Squeak, and they 
> do all encodings in plain Smalltalk! No primitives! So how come that 
> Squeak is so slow here?

  Is it true that you traded the performance for
Chinese with other languages?

  BTW, I can't see the difference between this and your "With
corrected table of results:".

  - UTF8TextConverter wasn't written with performance in mind (as you
    can tell^^;)
  - This kind of tight loop gives 3-5 factor of performance difference
    in VW and Squeak, plus,
  - immediate representation for characters must be helping a lot.

  For the OLPC, I think I will end up with writing primitives for
Squeak.  One could say that I should like the iconv library, but not
sure if that is a good idea or not...

-- Yoshiki