Unicode strings, benchmarks
Janko Mivšek
janko.mivsek at eranova.si
Mon Jun 11 21:41:07 UTC 2007
Hi Squeakers,
I already extended String with TwoByteString and did a "scaling" with
auto conversion to wider string when a wider character is put into a
string. So far so good and this already works in Aida/Web.
I also did a bit better UTF8 conversion but is only 25-80% faster that
existing one in UTF8TextConverter. To prepare for even better results, I
made a benchmark, which measure conversion time for English, French,
Slovenian, Russian and Chinese 2500 characters long text. It measure 100
conversions which accumulates to 250K characters of text.
Here are results in VW, Squeak with old UTF8 converter and a new one:
VW old new
english 30 313 248 ByteString, pure ASCII
french 32 323 251 ByteString, ISO8859-1 (Latin 1)
slovenian 48 578 480 TwoByteString Latin 2
russian 112 1306 720 TwoByteString Cyrillic
chinese 107 1544 3825 TwoByteString
Notice an exceptional 10x VW performance comparing to Squeak, and they
do all encodings in plain Smalltalk! No primitives! So how come that
Squeak is so slow here?
Above benchmark was done on Squeak 3.9 on Suse Linux 10.1, P3.2GHz.
Best regards
Janko
--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
More information about the Squeak-dev
mailing list
|