UTF8 Squeak
Janko Mivšek
janko.mivsek at eranova.si
Mon Jun 11 22:06:24 UTC 2007
Hi Javier,
Javier Diaz-Reinoso wrote:
> About 2 months ago in the OpenMCL mailing list have this UTF16 vs. UTF32
> discussion:
>> how many angels can dance on a unicode character?
>> http://thread.gmane.org/gmane.lisp.openmcl.devel/1756/focus=1763
>>
>
> Gary Byers (the OpenMCL's developer) finish with this conclusion:
>> If these numbers are roughly accurate and if the sketch of what
>> a displaced SIMPLE-STRING object would look like is realistic,
>> then I'd say that using UTF-16 to represent arbitrary Unicode
>> characters in a realistic way costs about as much memory-wise
>> as using UTF-32 does, is somewhat slower in the simplest cases
>> and much slower in general, has very complex boundary
>> cases once we step outside the BMP, and just generally doesn't
>> seem to have many socially-redeeming qualities that I can see.
> perhaps in Squeak is different (no alignment?), but if I doIt:
> (ByteString allInstances collect:[:s | s size] ) sum asFloat (in a 3.8.1
> basic image), I obtain:
>
> 1.943098e6, (63672 strings at 30.5 bytes average)
>
> so, all of this talk is for about 4 MB extra (in that image squeak take
> 26.8 MB at startup)?.
Consider image as a database where you store strings from your
application. In that case space efficient but still manipulable strings
really matter. For instance, I run one 380MB VW image full of
TwoByteStrings and this image would probably have 760M with only
FourByteStrings ...
Best regards
JAnko
--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
More information about the Squeak-dev
mailing list
|