[squeak-dev] The Trunk: Multilingual-tonyg.236.mcz

commits at source.squeak.org commits at source.squeak.org
Sun Feb 4 20:21:59 UTC 2018


David T. Lewis uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-tonyg.236.mcz

==================== Summary ====================

Name: Multilingual-tonyg.236
Author: tonyg
Time: 31 January 2018, 11:19:22.844612 pm
UUID: 62b136a5-9964-42d9-9397-2a6aa303f339
Ancestors: Multilingual-tonyg.235

Properly report short sequences as InvalidUTF8 rather than out-of-bounds subscript. Fixes a failing UTF8EdgeCaseTest>>testSequencesWithLastContinuationByteMissing.

=============== Diff against Multilingual-tonyg.235 ===============

Item was changed:
  ----- Method: UTF8TextConverter class>>decodeByteString: (in category 'conversion') -----
  decodeByteString: aByteString
  	"Convert the given string from UTF-8 using the fast path if converting to Latin-1"
  
+ 	| outStream lastIndex nextIndex limit byte1 byte2 byte3 byte4 unicode |
- 	| outStream lastIndex nextIndex byte1 byte2 byte3 byte4 unicode |
  	lastIndex := 1.
  	(nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0
  		ifTrue: [ ^aByteString ].
+ 	limit := aByteString size.
+ 	outStream := (String new: limit) writeStream.
- 	outStream := (String new: aByteString size) writeStream.
  	[
  		outStream next: nextIndex - lastIndex putAll: aByteString startingAt: lastIndex.
  		byte1 := aByteString byteAt: nextIndex.
  		(byte1 bitAnd: 16rE0) = 192 ifTrue: [ "two bytes"
+ 			nextIndex < limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  			byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[	^self errorMalformedInput: aByteString ].
  			unicode := ((byte1 bitAnd: 31) bitShift: 6) + (byte2 bitAnd: 63)].
  		(byte1 bitAnd: 16rF0) = 224 ifTrue: [ "three bytes"
+ 			(nextIndex + 2) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  			byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  			byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  			unicode := ((byte1 bitAnd: 15) bitShift: 12) + ((byte2 bitAnd: 63) bitShift: 6)
  				+ (byte3 bitAnd: 63)].
  		(byte1 bitAnd: 16rF8) = 240 ifTrue: [ "four bytes"
+ 			(nextIndex + 3) <= limit ifFalse: [ ^ self errorMalformedInput: aByteString ].
  			byte2 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte2 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  			byte3 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte3 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  			byte4 := aByteString byteAt: (nextIndex := nextIndex + 1).
  			(byte4 bitAnd: 16rC0) = 16r80 ifFalse:[ ^self errorMalformedInput: aByteString ].
  			unicode := ((byte1 bitAnd: 16r7) bitShift: 18) +
  							((byte2 bitAnd: 63) bitShift: 12) + 
  							((byte3 bitAnd: 63) bitShift: 6) +
  							(byte4 bitAnd: 63)].
  		unicode ifNil: [ ^self errorMalformedInput: aByteString ].
  		unicode = 16rFEFF ifFalse: [ "Skip byte order mark"
  			outStream nextPut: (Unicode value: unicode) ].
  		lastIndex := nextIndex + 1.
  		(nextIndex := ByteString findFirstInString: aByteString inSet: latin1Map startingAt: lastIndex) = 0 ] whileFalse.
  	^outStream 
  		next: aByteString size - lastIndex + 1 putAll: aByteString startingAt: lastIndex;
  		contents
  !



More information about the Squeak-dev mailing list