[squeak-dev] #nextChunk speedup, the future of multibyte streams

Colin Putney cputney at wiresong.ca
Sun Jan 31 19:33:54 UTC 2010


On 2010-01-31, at 10:54 AM, Igor Stasenko wrote:

>> Why? UTF-8 is ASCII compatible.
>> 
> 
> Well, utf8 is an octet stream (bytes), not characters. While we are
> seeking for '!' character, not byte.
> Logically, the data flow should be following:
> <primitive> -> ByteArray -> utf8 reader -> character stream -> '!'
> 
> sure, due to nature of utf8 encoding you could shortcut, but then
> because of such hacks, you won't be able to
> switch to different encoding without pain:
> 
> <primitive> -> ByteArray -> <XYZ> reader -> character stream -> '!'

+1

Bytes and characters are not the same thing.

Colin



More information about the Squeak-dev mailing list