[squeak-dev] #nextChunk speedup, the future of multibyte streams

Levente Uzonyi leves at elte.hu
Sun Jan 31 09:27:48 UTC 2010


On Sat, 30 Jan 2010, Igor Stasenko wrote:

> On 30 January 2010 09:15, Bert Freudenberg <bert at freudenbergs.de> wrote:
>> On 29.01.2010, at 20:07, Chris Cunningham wrote:
>>>
>>> On Fri, Jan 29, 2010 at 6:09 PM, Levente Uzonyi <leves at elte.hu> wrote:
>>>> - it assumes that ! is encoded as byte 33 and whenever byte 33 occurs in
>>>>  the encoded stream that byte is an encoded ! character
>>>
>>> The "whenever byte 33 occurs in the encoded stream that byte is an
>>> encoded ! character" part of this seems suspect to me.  Are you
>>> checking the bytes for byte 33, or are you still checking characters,
>>> and one of the characters is byte 33, then you assume it is ! ?  If
>>> you are just scanning bytes, I would assume that some UTF-8 characters
>>> could have a byte 33 encoded in them.
>>
>> Wrong.
>>
>>> Although I'm not a UTF-8 expert.
>>
>> Obviously ;) See
>>
>> http://en.wikipedia.org/wiki/UTF-8#Description
>>
> Either way, the presence of ! character should be tested after
> decoding utf8 data.

Why? UTF-8 is ASCII compatible.


Levente

>
>> - Bert -
>>
>>
>>
>>
>
>
>
> -- 
> Best regards,
> Igor Stasenko AKA sig.
>
>


More information about the Squeak-dev mailing list