[Pkg] The Trunk: Collections-ar.172.mcz
commits at source.squeak.org
commits at source.squeak.org
Fri Oct 23 16:56:38 UTC 2009
Andreas Raab uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ar.172.mcz
==================== Summary ====================
Name: Collections-ar.172
Author: ar
Time: 23 October 2009, 9:56:15 am
UUID: 16488c43-6599-cf45-8502-bce910fa280a
Ancestors: Collections-nice.171
Ensure leading char gets inserted properly in utf8ToSqueak.
=============== Diff against Collections-nice.171 ===============
Item was changed:
----- Method: ByteString>>utf8ToSqueak (in category 'converting') -----
utf8ToSqueak
"Convert the given string from UTF-8 using the fast path if converting to Latin-1"
| outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
Latin1ToUtf8Map ifNil:[^super utf8ToSqueak]. "installation guard"
lastIndex := 1.
nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
nextIndex = 0 ifTrue:[^self].
outStream := (String new: self size) writeStream.
[outStream next: nextIndex-lastIndex putAll: self startingAt: lastIndex.
byte1 := self byteAt: nextIndex.
(byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
byte2 := self byteAt: (nextIndex := nextIndex+1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
(byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
byte2 := self byteAt: (nextIndex := nextIndex+1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
byte3 := self byteAt: (nextIndex := nextIndex+1).
(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
+ (byte3 bitAnd: 63)].
(byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
byte2 := self byteAt: (nextIndex := nextIndex+1).
(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
byte3 := self byteAt: (nextIndex := nextIndex+1).
(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
byte4 := self byteAt: (nextIndex := nextIndex+1).
(byte4 bitAnd: 16rC0) = 16r80 ifFalse:[^self]. "invalid UTF-8; presume Latin-1"
unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
((byte2 bitAnd: 63) bitShift: 12) +
((byte3 bitAnd: 63) bitShift: 6) +
(byte4 bitAnd: 63)].
unicode ifNil:[^self]. "invalid UTF-8; presume Latin-1"
+ outStream nextPut: (Unicode value: unicode).
- outStream nextPut: (Character value: unicode).
lastIndex := nextIndex + 1.
nextIndex := ByteString findFirstInString: self inSet: Latin1ToUtf8Map startingAt: lastIndex.
nextIndex = 0] whileFalse.
outStream next: self size-lastIndex+1 putAll: self startingAt: lastIndex.
^outStream contents
!
More information about the Packages
mailing list