[squeak-dev] Re: 30 bit unboxed floats

Igor Stasenko siguctua at gmail.com
Mon Oct 18 21:39:02 UTC 2010


On 19 October 2010 00:24, Andreas Raab <andreas.raab at gmx.de> wrote:
> On 10/18/2010 2:09 PM, Igor Stasenko wrote:
>>
>> My own opinion, it should be floats. Because characters in fact is
>> nothing more than integer number , identifying a code point in
>> some encoding. And in general, this means, that with proper design,
>> one could avoid using encapsulated integers (which Character are), but
>> instead use integer values directly.
>
> Uhm ... Integer>>isLowercase? I don't think so. The only realistic
> alternative in my opinion is to use Strings exclusively (i.e., Characters
> being strings of length 1; potentially even a 'special' subclass of String).
> But I'm not a particularly big fan of that design either.
>

No, i was talking about conversion layers, like reading
file(s)/streams. They should be
always a binary, so stuff like utf8 encoder/decoder will deal with
integer values.
In this way, one could really avoid dealing with individual character(s),
and only topmost layers will have to produce strings as output.
We might require some additional primitives to speed things up.
Like in addition of converting ByteArray -> ByteString
also provide a primitive, for converting Array -> WideString,
or WordArray->WideString.
Again, when rendering text, you turning a character(s) into an
integer(s) to look-up a glyph in some font.
So, it also could be done in more straightforward way to avoid
boxing/unboxing overhead.

What i meant to say, that there are definitely much more space for improvement
to handle Characters in a way to minimize the impact of
boxing/unboxing, comparing to what could be done
with boxed floats.
It simply because the way how they are used.

> Cheers,
>  - Andreas
>
>> So, there are many places, where impact of boxing characters on
>> performance could be minimized. Because one could manipulate with
>> (sub)strings, instead of individual characters.
>>
>> In contrast, floats having much less potential for optimization like
>> that. Of course , we having a FloatArray-s and stuff like that,
>> but once you writing down a code to evaluate some formula, it going to
>> deal with a single floating-point value(s).
>> This is what you never doing with characters , writing formulae... :)
>>
>>
>> On 18 October 2010 23:49, Colin Putney<colin at wiresong.com>  wrote:
>>>
>>> On Mon, Oct 18, 2010 at 1:38 PM, Eliot Miranda<eliot.miranda at gmail.com>
>>>  wrote:
>>>
>>>> I think immediate characters are much more generally useful, especially
>>>> considering unicode.  The current implementation of only codes 0 to 255
>>>> being == is error-prone.
>>>
>>> I enthusiastically agree. I think string handling in Squeak is pretty
>>> weak, and having efficient Characters would provide a solid basis for
>>> improving it.
>>>
>>> Colin
>>>
>>>
>>
>>
>>
>
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list