[squeak-dev] #nextChunk speedup, the future of multibyte streams

Sat Jan 30 07:15:16 UTC 2010

On 29.01.2010, at 20:07, Chris Cunningham wrote:
> 
> On Fri, Jan 29, 2010 at 6:09 PM, Levente Uzonyi <leves at elte.hu> wrote:
>> - it assumes that ! is encoded as byte 33 and whenever byte 33 occurs in
>>  the encoded stream that byte is an encoded ! character
> 
> The "whenever byte 33 occurs in the encoded stream that byte is an
> encoded ! character" part of this seems suspect to me.  Are you
> checking the bytes for byte 33, or are you still checking characters,
> and one of the characters is byte 33, then you assume it is ! ?  If
> you are just scanning bytes, I would assume that some UTF-8 characters
> could have a byte 33 encoded in them.

Wrong.

> Although I'm not a UTF-8 expert.

Obviously ;) See

http://en.wikipedia.org/wiki/UTF-8#Description

- Bert -