[squeak-dev] The Trunk: Collections-ar.172.mcz

commits at source.squeak.org commits at source.squeak.org
Fri Oct 23 16:56:38 UTC 2009


Andreas Raab uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ar.172.mcz

==================== Summary ====================

Name: Collections-ar.172
Author: ar
Time: 23 October 2009, 9:56:15 am
UUID: 16488c43-6599-cf45-8502-bce910fa280a
Ancestors: Collections-nice.171

Ensure leading char gets inserted properly in utf8ToSqueak.

=============== Diff against Collections-nice.171 ===============

Item was changed:
  ----- Method: ByteString>>utf8ToSqueak (in category 'converting') -----
  utf8ToSqueak
  	"Convert the given string from UTF-8 using the fast path if converting to Latin-1"
  	| outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
  	Latin1ToUtf8Map ifNil:[^super utf8ToSqueak]. "installation guard"
  	lastIndex := 1.
  	nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
  	nextIndex = 0 ifTrue:[^self].
  	outStream := (String new: self size) writeStream.
  	[outStream next: nextIndex-lastIndex putAll: self startingAt: lastIndex.
  	byte1 := self byteAt: nextIndex.
  	(byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
  		byte2 := self byteAt: (nextIndex := nextIndex+1).
  		(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
  	(byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
  		byte2 := self byteAt: (nextIndex := nextIndex+1).
  		(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		byte3 := self byteAt: (nextIndex := nextIndex+1).
  		(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
  			+ (byte3 bitAnd: 63)].
  	(byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
  		byte2 := self byteAt: (nextIndex := nextIndex+1).
  		(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		byte3 := self byteAt: (nextIndex := nextIndex+1).
  		(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		byte4 := self byteAt: (nextIndex := nextIndex+1).
  		(byte4 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
  		unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
  						((byte2 bitAnd: 63) bitShift: 12) + 
  						((byte3 bitAnd: 63) bitShift: 6) +
  						(byte4 bitAnd: 63)].
  	unicode ifNil:[^self]. "invalid UTF-8; presume Latin-1"
+ 	outStream nextPut: (Unicode value: unicode).
- 	outStream nextPut: (Character value: unicode).
  	lastIndex := nextIndex + 1.
  	nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
  	nextIndex = 0] whileFalse.
  	outStream next: self size-lastIndex+1 putAll: self startingAt: lastIndex.
  	^outStream contents
  !




More information about the Squeak-dev mailing list