[squeak-dev] The Trunk: Multilingual-ul.208.mcz
commits at source.squeak.org
commits at source.squeak.org
Sun May 3 00:16:00 UTC 2015
Levente Uzonyi uploaded a new version of Multilingual to project The Trunk:
http://source.squeak.org/trunk/Multilingual-ul.208.mcz
==================== Summary ====================
Name: Multilingual-ul.208
Author: ul
Time: 1 May 2015, 3:25:18.828 pm
UUID: 82d19dac-c602-4c0d-bc9a-7858e3a3c283
Ancestors: Multilingual-ul.206
Improved Unicode caseMappings:
- Don't overwrite an existing mapping, because that leads to problems (like (Unicode toUppercaseCode: $k asciiValue) = 8490)
- Use PluggableDictionary class >> #integerDictionary for better lookup performance (~+16%), and compaction resistance (done at every release).
- Compact the dictionaries before saving.
- Save the new dictionaries atomically.
=============== Diff against Multilingual-ul.206 ===============
Item was changed:
----- Method: Unicode class>>initializeCaseMappings (in category 'casing') -----
initializeCaseMappings
"Unicode initializeCaseMappings"
+
+ UIManager default informUserDuring: [ :bar |
- ToCasefold := IdentityDictionary new.
- ToUpper := IdentityDictionary new.
- ToLower := IdentityDictionary new.
- UIManager default informUserDuring: [:bar|
| stream |
bar value: 'Downloading Unicode data'.
stream := HTTPClient httpGet: 'http://www.unicode.org/Public/UNIDATA/CaseFolding.txt'.
(stream isKindOf: RWBinaryOrTextStream) ifFalse:[^self error: 'Download failed'].
stream reset.
bar value: 'Updating Case Mappings'.
+ self parseCaseMappingFrom: stream ].!
- self parseCaseMappingFrom: stream.
- ].!
Item was changed:
----- Method: Unicode class>>parseCaseMappingFrom: (in category 'casing') -----
parseCaseMappingFrom: stream
"Parse the Unicode casing mappings from the given stream.
Handle only the simple mappings"
"
Unicode initializeCaseMappings.
"
+ | newToCasefold newToUpper newToLower casefoldKeys |
+ newToCasefold := PluggableDictionary integerDictionary.
+ newToUpper := PluggableDictionary integerDictionary.
+ newToLower := PluggableDictionary integerDictionary.
- ToCasefold := IdentityDictionary new: 2048.
- ToUpper := IdentityDictionary new: 2048.
- ToLower := IdentityDictionary new: 2048.
+ "Filter the mappings (Simple and Common) to newToCasefold."
+ stream contents linesDo: [ :line |
+ | data fields sourceCode destinationCode |
+ data := line copyUpTo: $#.
+ fields := data findTokens: '; '.
+ (fields size > 2 and: [ #('C' 'S') includes: (fields at: 2) ]) ifTrue:[
+ sourceCode := Integer readFrom: (fields at: 1) base: 16.
+ destinationCode := Integer readFrom: (fields at: 3) base: 16.
+ newToCasefold at: sourceCode put: destinationCode ] ].
- [stream atEnd] whileFalse:[
- | fields line srcCode dstCode |
- line := stream nextLine copyUpTo: $#.
- fields := line withBlanksTrimmed findTokens: $;.
- (fields size > 2 and: [#('C' 'S') includes: (fields at: 2) withBlanksTrimmed]) ifTrue:[
- srcCode := Integer readFrom: (fields at: 1) withBlanksTrimmed base: 16.
- dstCode := Integer readFrom: (fields at: 3) withBlanksTrimmed base: 16.
- ToCasefold at: srcCode put: dstCode.
- ].
- ].
+ casefoldKeys := newToCasefold keys.
+ newToCasefold keysAndValuesDo: [ :sourceCode :destinationCode |
+ (self isUppercaseCode: sourceCode) ifTrue: [
+ "In most cases, uppercase letter are folded to lower case"
+ newToUpper at: destinationCode put: sourceCode.
+ newToLower at: sourceCode ifAbsentPut: destinationCode "Don't overwrite existing pairs. To avoid $k asUppercase to return the Kelvin character (8490)." ].
+ (self isLowercaseCode: sourceCode) ifTrue: [
+ "In a few cases, two upper case letters are folded to the same lower case.
+ We must find an upper case letter folded to the same letter"
+ casefoldKeys
+ detect: [ :each |
+ (self isUppercaseCode: each) and: [
+ (newToCasefold at: each) = destinationCode ] ]
+ ifFound: [ :uppercaseCode |
+ newToUpper at: sourceCode put: uppercaseCode ]
+ ifNone: [ ] ] ].
+
+ "Compact the dictionaries."
+ newToCasefold compact.
+ newToUpper compact.
+ newToLower compact.
+ "Save in an atomic operation."
+ ToCasefold := newToCasefold.
+ ToUpper := newToUpper.
+ ToLower := newToLower
+ !
- ToCasefold keysAndValuesDo:
- [:k :v |
- (self isUppercaseCode: k)
- ifTrue:
- ["In most cases, uppercase letter are folded to lower case"
- ToUpper at: v put: k.
- ToLower at: k put: v].
- (self isLowercaseCode: k)
- ifTrue:
- ["In a few cases, two upper case letters are folded to the same lower case.
- We must find an upper case letter folded to the same letter"
- | up |
- up := ToCasefold keys detect: [:e | (self isUppercaseCode: e) and: [(ToCasefold at: e) = v]] ifNone: [nil].
- up ifNotNil: [ToUpper at: k put: up]]].!
More information about the Squeak-dev
mailing list
|