Unicode strings, benchmarks
janko.mivsek at eranova.si
Mon Jun 11 22:28:24 UTC 2007
Yoshiki Ohshima wrote:
>> I also did a bit better UTF8 conversion but is only 25-80% faster that
>> existing one in UTF8TextConverter.
>> Here are results in VW, Squeak with old UTF8 converter and a new one:
>> VW old new
>> english 30 313 248 ByteString, pure ASCII
>> french 32 323 251 ByteString, ISO8859-1 (Latin 1)
>> slovenian 48 578 480 TwoByteString Latin 2
>> russian 112 1306 720 TwoByteString Cyrillic
>> chinese 107 1544 3825 TwoByteString
>> Notice an exceptional 10x VW performance comparing to Squeak, and they
>> do all encodings in plain Smalltalk! No primitives! So how come that
>> Squeak is so slow here?
> Is it true that you traded the performance for
> Chinese with other languages?
Definitively not, and I just don't understand why Chinese is so slow. I
hope you'll be able too look at that code to see, what's wrong. And
Chinese is close to Japanese, right? I learned Chinese a bit 20 years
ago, but this was not of much help - I forgot too much :)
I'll prepare and publish code and benchmark tomorrow.
> BTW, I can't see the difference between this and your "With
> corrected table of results:".
The "corrected" should be "with corrected layout", just that. Sorry for
> - UTF8TextConverter wasn't written with performance in mind (as you
> can tell^^;)
> - This kind of tight loop gives 3-5 factor of performance difference
> in VW and Squeak, plus,
> - immediate representation for characters must be helping a lot.
> For the OLPC, I think I will end up with writing primitives for
> Squeak. One could say that I should like the iconv library, but not
> sure if that is a good idea or not...
> -- Yoshiki
Smalltalk Web Application Server
More information about the Squeak-dev