[squeak-dev] #nextChunk speedup, the future of multibyte streams

Sat Jan 30 08:14:51 UTC 2010

On 30 January 2010 09:15, Bert Freudenberg <bert at freudenbergs.de> wrote:
> On 29.01.2010, at 20:07, Chris Cunningham wrote:
>>
>> On Fri, Jan 29, 2010 at 6:09 PM, Levente Uzonyi <leves at elte.hu> wrote:
>>> - it assumes that ! is encoded as byte 33 and whenever byte 33 occurs in
>>>  the encoded stream that byte is an encoded ! character
>>
>> The "whenever byte 33 occurs in the encoded stream that byte is an
>> encoded ! character" part of this seems suspect to me.  Are you
>> checking the bytes for byte 33, or are you still checking characters,
>> and one of the characters is byte 33, then you assume it is ! ?  If
>> you are just scanning bytes, I would assume that some UTF-8 characters
>> could have a byte 33 encoded in them.
>
> Wrong.
>
>> Although I'm not a UTF-8 expert.
>
> Obviously ;) See
>
> http://en.wikipedia.org/wiki/UTF-8#Description
>
Either way, the presence of ! character should be tested after
decoding utf8 data.

> - Bert -
>
>
>
>


-- 
Best regards,
Igor Stasenko AKA sig.