UTF8 Squeak
Andreas Raab
andreas.raab at gmx.de
Fri Jun 8 07:45:50 UTC 2007
Lukas Renggli wrote:
>> But is that a property of 1) Seaside or 2) Squeak or 3) UTF-8? If the
>> first, just fix it ;-) If the second, what conversions are slow? If the
>> third, why not speed it up by a primitive? (UTF-8 translation isn't that
>> hard)
>
> I would if I knew how to do it.
I'll see if I can find some time on the weekend to look at this.
>> > What most people do is to work with
>> > (Squeak 2.7 or) ByteStrings that they treat like ByteArrays. The data
>> > is received, stored, and sent exactly the way it comes from the
>> > socket. Byte identical strings are sent back as they were received.
>>
>> I assume you mean Seaside 2.7 above not Squeak 2.7.
>
> I am talking about Squeak 3.7. There are many Seaside users that will
> stick with Squeak 3.7 forever.
Yes, using Squeak ->3<-.7 can make good sense for people who don't care
about using m17n internally (definitely more than using Squeak ->2<-.7
as you wrote initially).
>> How about trying to improve the speed of conversions? You seem to imply
>> that this is the major issue here, so if the conversions where
>> blindingly fast (which I think they easily could by writing one or two
>> primitives) this should improve matters.
>
> Are you taking about escaping? In Seaside 2.8 the escaping is already
> 2 times faster than in Seaside 2.7. Character encoding is another
> story.
I'm talking about UTF-8 conversions. A simple thing to do would be (for
example) to have a lookup table for everything covered by 2-byte
encodings (which is practically everything in the western hemisphere).
Something like here:
nextFromStream: stream
"Read a UTF-8 encoded character from the stream"
value1 := utf8Table at: stream nextByte.
value1 isCharacter ifTrue:[^value1].
value1 isArray ifTrue:[
value2 := value1 at: stream nextByte.
value2 isCharacter ifTrue:[^value2].
"... put the slow code here ..."
(note that the lookup table can include the required language tags etc.
to make any further conversion unnecessary) Beyond which a primitive
would go a very long way here.
Cheers,
- Andreas
More information about the Squeak-dev
mailing list
|