UTF8 Squeak

Andreas Raab andreas.raab at gmx.de
Fri Jun 8 07:45:50 UTC 2007


Lukas Renggli wrote:
>> But is that a property of 1) Seaside or 2) Squeak or 3) UTF-8? If the
>> first, just fix it ;-) If the second, what conversions are slow? If the
>> third, why not speed it up by a primitive? (UTF-8 translation isn't that
>> hard)
> 
> I would if I knew how to do it.

I'll see if I can find some time on the weekend to look at this.

>> > What most people do is to work with
>> > (Squeak 2.7 or) ByteStrings that they treat like ByteArrays. The data
>> > is received, stored, and sent exactly the way it comes from the
>> > socket. Byte identical strings are sent back as they were received.
>>
>> I assume you mean Seaside 2.7 above not Squeak 2.7.
> 
> I am talking about Squeak 3.7. There are many Seaside users that will
> stick with Squeak 3.7 forever.

Yes, using Squeak ->3<-.7 can make good sense for people who don't care 
about using m17n internally (definitely more than using Squeak ->2<-.7 
as you wrote initially).

>> How about trying to improve the speed of conversions? You seem to imply
>> that this is the major issue here, so if the conversions where
>> blindingly fast (which I think they easily could by writing one or two
>> primitives) this should improve matters.
> 
> Are you taking about escaping? In Seaside 2.8 the escaping is already
> 2 times faster than in Seaside 2.7. Character encoding is another
> story.

I'm talking about UTF-8 conversions. A simple thing to do would be (for 
example) to have a lookup table for everything covered by 2-byte 
encodings (which is practically everything in the western hemisphere).
Something like here:

nextFromStream: stream
	"Read a UTF-8 encoded character from the stream"
	value1 := utf8Table at: stream nextByte.
	value1 isCharacter ifTrue:[^value1].
	value1 isArray ifTrue:[
		value2 := value1 at: stream nextByte.
		value2 isCharacter ifTrue:[^value2].
	"... put the slow code here ..."

(note that the lookup table can include the required language tags etc. 
to make any further conversion unnecessary) Beyond which a primitive 
would go a very long way here.

Cheers,
   - Andreas



More information about the Squeak-dev mailing list