[squeak-dev] The Trunk: Multilingual-ul.208.mcz

commits at source.squeak.org commits at source.squeak.org
Sun May 3 00:16:00 UTC 2015


Levente Uzonyi uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-ul.208.mcz

==================== Summary ====================

Name: Multilingual-ul.208
Author: ul
Time: 1 May 2015, 3:25:18.828 pm
UUID: 82d19dac-c602-4c0d-bc9a-7858e3a3c283
Ancestors: Multilingual-ul.206

Improved Unicode caseMappings:
- Don't overwrite an existing mapping, because that leads to problems (like (Unicode toUppercaseCode: $k asciiValue) = 8490)
- Use PluggableDictionary class >> #integerDictionary for better lookup performance (~+16%), and compaction resistance (done at every release).
- Compact the dictionaries before saving.
- Save the new dictionaries atomically.

=============== Diff against Multilingual-ul.206 ===============

Item was changed:
  ----- Method: Unicode class>>initializeCaseMappings (in category 'casing') -----
  initializeCaseMappings
  	"Unicode initializeCaseMappings"
+ 	
+ 	UIManager default informUserDuring: [ :bar |
- 	ToCasefold := IdentityDictionary new.
- 	ToUpper := IdentityDictionary new.
- 	ToLower := IdentityDictionary new.
- 	UIManager default informUserDuring: [:bar|
  		| stream |
  		bar value: 'Downloading Unicode data'.
  		stream := HTTPClient httpGet: 'http://www.unicode.org/Public/UNIDATA/CaseFolding.txt'.
  		(stream isKindOf: RWBinaryOrTextStream) ifFalse:[^self error: 'Download failed'].
  		stream reset.
  		bar value: 'Updating Case Mappings'.
+ 		self parseCaseMappingFrom: stream ].!
- 		self parseCaseMappingFrom: stream.
- 	].!

Item was changed:
  ----- Method: Unicode class>>parseCaseMappingFrom: (in category 'casing') -----
  parseCaseMappingFrom: stream
  	"Parse the Unicode casing mappings from the given stream.
  	Handle only the simple mappings"
  	"
  		Unicode initializeCaseMappings.
  	"
  
+ 	| newToCasefold newToUpper newToLower casefoldKeys |
+ 	newToCasefold := PluggableDictionary integerDictionary.
+ 	newToUpper := PluggableDictionary integerDictionary.
+ 	newToLower := PluggableDictionary integerDictionary.
- 	ToCasefold := IdentityDictionary new: 2048.
- 	ToUpper := IdentityDictionary new: 2048.
- 	ToLower := IdentityDictionary new: 2048.
  
+ 	"Filter the mappings (Simple and Common) to newToCasefold."
+ 	stream contents linesDo: [ :line |
+ 		| data fields sourceCode destinationCode |
+ 		data := line copyUpTo: $#.
+ 		fields := data findTokens: '; '.
+ 		(fields size > 2 and: [ #('C' 'S') includes: (fields at: 2) ]) ifTrue:[
+ 			sourceCode := Integer readFrom: (fields at: 1) base: 16.
+ 			destinationCode := Integer readFrom: (fields at: 3) base: 16.
+ 			newToCasefold at: sourceCode put: destinationCode ] ].
- 	[stream atEnd] whileFalse:[
- 		| fields line srcCode dstCode |
- 		line := stream nextLine copyUpTo: $#.
- 		fields := line withBlanksTrimmed findTokens: $;.
- 		(fields size > 2 and: [#('C' 'S') includes: (fields at: 2) withBlanksTrimmed]) ifTrue:[
- 			srcCode := Integer readFrom: (fields at: 1) withBlanksTrimmed base: 16.
- 			dstCode := Integer readFrom: (fields at: 3) withBlanksTrimmed base: 16.
- 			ToCasefold at: srcCode put: dstCode.
- 		].
- 	].
  
+ 	casefoldKeys := newToCasefold keys.
+ 	newToCasefold keysAndValuesDo: [ :sourceCode :destinationCode |
+ 		(self isUppercaseCode: sourceCode) ifTrue: [
+ 			"In most cases, uppercase letter are folded to lower case"
+ 			newToUpper at: destinationCode put: sourceCode.
+ 			newToLower at: sourceCode ifAbsentPut: destinationCode "Don't overwrite existing pairs. To avoid $k asUppercase to return the Kelvin character (8490)." ].
+ 		(self isLowercaseCode: sourceCode) ifTrue: [
+ 			"In a few cases, two upper case letters are folded to the same lower case.
+ 			We must find an upper case letter folded to the same letter"
+ 			casefoldKeys
+ 				detect: [ :each | 
+ 					(self isUppercaseCode: each) and: [
+ 						(newToCasefold at: each) = destinationCode ] ]
+ 				ifFound: [ :uppercaseCode |
+ 					newToUpper at: sourceCode put: uppercaseCode ]
+ 				ifNone: [ ] ] ].
+ 
+ 	"Compact the dictionaries."
+ 	newToCasefold compact.
+ 	newToUpper compact.
+ 	newToLower compact.
+ 	"Save in an atomic operation."
+ 	ToCasefold := newToCasefold.
+ 	ToUpper := newToUpper.
+ 	ToLower := newToLower
+ 	!
- 	ToCasefold keysAndValuesDo:
- 		[:k :v |
- 		(self isUppercaseCode: k)
- 			ifTrue:
- 				["In most cases, uppercase letter are folded to lower case"
- 				ToUpper at: v put: k.
- 				ToLower at: k put: v].
- 		(self isLowercaseCode: k)
- 			ifTrue:
- 				["In a few cases, two upper case letters are folded to the same lower case.
- 				We must find an upper case letter folded to the same letter"
- 				| up |
- 				up := ToCasefold keys detect: [:e | (self isUppercaseCode: e) and: [(ToCasefold at: e) = v]] ifNone: [nil].
- 				up ifNotNil: [ToUpper at: k put: up]]].!



More information about the Squeak-dev mailing list