Hi Squeakers,
I already extended String with TwoByteString and did a "scaling" with auto conversion to wider string when a wider character is put into a string. So far so good and this already works in Aida/Web.
I also did a bit better UTF8 conversion but is only 25-80% faster that existing one in UTF8TextConverter. To prepare for even better results, I made a benchmark, which measure conversion time for English, French, Slovenian, Russian and Chinese 2500 characters long text. It measure 100 conversions which accumulates to 250K characters of text.
Here are results in VW, Squeak with old UTF8 converter and a new one:
VW old new english 30 313 248 ByteString, pure ASCII french 32 323 251 ByteString, ISO8859-1 (Latin 1) slovenian 48 578 480 TwoByteString Latin 2 russian 112 1306 720 TwoByteString Cyrillic chinese 107 1544 3825 TwoByteString
Notice an exceptional 10x VW performance comparing to Squeak, and they do all encodings in plain Smalltalk! No primitives! So how come that Squeak is so slow here?
Above benchmark was done on Squeak 3.9 on Suse Linux 10.1, P3.2GHz.
Best regards Janko