Unicode strings, benchmarks
yoshiki at squeakland.org
Mon Jun 11 22:10:40 UTC 2007
> I also did a bit better UTF8 conversion but is only 25-80% faster that
> existing one in UTF8TextConverter.
> Here are results in VW, Squeak with old UTF8 converter and a new one:
> VW old new
> english 30 313 248 ByteString, pure ASCII
> french 32 323 251 ByteString, ISO8859-1 (Latin 1)
> slovenian 48 578 480 TwoByteString Latin 2
> russian 112 1306 720 TwoByteString Cyrillic
> chinese 107 1544 3825 TwoByteString
> Notice an exceptional 10x VW performance comparing to Squeak, and they
> do all encodings in plain Smalltalk! No primitives! So how come that
> Squeak is so slow here?
Is it true that you traded the performance for
Chinese with other languages?
BTW, I can't see the difference between this and your "With
corrected table of results:".
- UTF8TextConverter wasn't written with performance in mind (as you
- This kind of tight loop gives 3-5 factor of performance difference
in VW and Squeak, plus,
- immediate representation for characters must be helping a lot.
For the OLPC, I think I will end up with writing primitives for
Squeak. One could say that I should like the iconv library, but not
sure if that is a good idea or not...
More information about the Squeak-dev