<div dir="ltr"><div>Ouch, yes, extracting simple case mapping from full CaseFolding data was probably a mistake...<br>Thanks for reviewing, and as we say, vieux motard que jamais (better late than never) - it's almost 5 years old<br><br></div><div>Next job will be to comment Unicode class, and explain which unicode operation is supported...<br></div><br>--------------------------<br><div><br><span>Multilingual</span>-<span>nice</span>.<span>123</span><br>
Author: <span>nice</span><br>
Time: 14 July 2010, 1:17:02.219 pm<br>
UUID: ec8f05b8-78a6-4496-aca9-8f9b2e54823d<br>
Ancestors: <span>Multilingual</span>-ul.122<br>
<br>
1) simplify a case of at:ifAbsentPut: pattern in SparseXTable<br>
2) provide a simple mapping of unicode upper/lower case characters as described at <a href="http://unicode.org/reports/tr21/tr21-5.html" target="_blank">http://unicode.org/reports/tr21/tr21-5.html</a><br>
<br>
Note 1: Unicode class now provides two utilities to transform case of a
String rather than of a Character. This is for enabling future
enhancements like handling special casings when case folding does change
the number of characters.<br>
<br>
Note 2: there is no automatic initialization performed yet. You'll have to execute this before using above utilities:<br>
Unicode initializeCaseMappings.<br>
<br>
This is only an unoptimized, first attempt proposal. Comments and changes are of course welcome.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-05-03 2:15 GMT+02:00 <span dir="ltr"><<a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Levente Uzonyi uploaded a new version of Multilingual to project The Trunk:<br>
<a href="http://source.squeak.org/trunk/Multilingual-ul.208.mcz" target="_blank">http://source.squeak.org/trunk/Multilingual-ul.208.mcz</a><br>
<br>
==================== Summary ====================<br>
<br>
Name: Multilingual-ul.208<br>
Author: ul<br>
Time: 1 May 2015, 3:25:18.828 pm<br>
UUID: 82d19dac-c602-4c0d-bc9a-7858e3a3c283<br>
Ancestors: Multilingual-ul.206<br>
<br>
Improved Unicode caseMappings:<br>
- Don't overwrite an existing mapping, because that leads to problems (like (Unicode toUppercaseCode: $k asciiValue) = 8490)<br>
- Use PluggableDictionary class >> #integerDictionary for better lookup performance (~+16%), and compaction resistance (done at every release).<br>
- Compact the dictionaries before saving.<br>
- Save the new dictionaries atomically.<br>
<br>
=============== Diff against Multilingual-ul.206 ===============<br>
<br>
Item was changed:<br>
----- Method: Unicode class>>initializeCaseMappings (in category 'casing') -----<br>
initializeCaseMappings<br>
"Unicode initializeCaseMappings"<br>
+<br>
+ UIManager default informUserDuring: [ :bar |<br>
- ToCasefold := IdentityDictionary new.<br>
- ToUpper := IdentityDictionary new.<br>
- ToLower := IdentityDictionary new.<br>
- UIManager default informUserDuring: [:bar|<br>
| stream |<br>
bar value: 'Downloading Unicode data'.<br>
stream := HTTPClient httpGet: '<a href="http://www.unicode.org/Public/UNIDATA/CaseFolding.txt" target="_blank">http://www.unicode.org/Public/UNIDATA/CaseFolding.txt</a>'.<br>
(stream isKindOf: RWBinaryOrTextStream) ifFalse:[^self error: 'Download failed'].<br>
stream reset.<br>
bar value: 'Updating Case Mappings'.<br>
+ self parseCaseMappingFrom: stream ].!<br>
- self parseCaseMappingFrom: stream.<br>
- ].!<br>
<br>
Item was changed:<br>
----- Method: Unicode class>>parseCaseMappingFrom: (in category 'casing') -----<br>
parseCaseMappingFrom: stream<br>
"Parse the Unicode casing mappings from the given stream.<br>
Handle only the simple mappings"<br>
"<br>
Unicode initializeCaseMappings.<br>
"<br>
<br>
+ | newToCasefold newToUpper newToLower casefoldKeys |<br>
+ newToCasefold := PluggableDictionary integerDictionary.<br>
+ newToUpper := PluggableDictionary integerDictionary.<br>
+ newToLower := PluggableDictionary integerDictionary.<br>
- ToCasefold := IdentityDictionary new: 2048.<br>
- ToUpper := IdentityDictionary new: 2048.<br>
- ToLower := IdentityDictionary new: 2048.<br>
<br>
+ "Filter the mappings (Simple and Common) to newToCasefold."<br>
+ stream contents linesDo: [ :line |<br>
+ | data fields sourceCode destinationCode |<br>
+ data := line copyUpTo: $#.<br>
+ fields := data findTokens: '; '.<br>
+ (fields size > 2 and: [ #('C' 'S') includes: (fields at: 2) ]) ifTrue:[<br>
+ sourceCode := Integer readFrom: (fields at: 1) base: 16.<br>
+ destinationCode := Integer readFrom: (fields at: 3) base: 16.<br>
+ newToCasefold at: sourceCode put: destinationCode ] ].<br>
- [stream atEnd] whileFalse:[<br>
- | fields line srcCode dstCode |<br>
- line := stream nextLine copyUpTo: $#.<br>
- fields := line withBlanksTrimmed findTokens: $;.<br>
- (fields size > 2 and: [#('C' 'S') includes: (fields at: 2) withBlanksTrimmed]) ifTrue:[<br>
- srcCode := Integer readFrom: (fields at: 1) withBlanksTrimmed base: 16.<br>
- dstCode := Integer readFrom: (fields at: 3) withBlanksTrimmed base: 16.<br>
- ToCasefold at: srcCode put: dstCode.<br>
- ].<br>
- ].<br>
<br>
+ casefoldKeys := newToCasefold keys.<br>
+ newToCasefold keysAndValuesDo: [ :sourceCode :destinationCode |<br>
+ (self isUppercaseCode: sourceCode) ifTrue: [<br>
+ "In most cases, uppercase letter are folded to lower case"<br>
+ newToUpper at: destinationCode put: sourceCode.<br>
+ newToLower at: sourceCode ifAbsentPut: destinationCode "Don't overwrite existing pairs. To avoid $k asUppercase to return the Kelvin character (8490)." ].<br>
+ (self isLowercaseCode: sourceCode) ifTrue: [<br>
+ "In a few cases, two upper case letters are folded to the same lower case.<br>
+ We must find an upper case letter folded to the same letter"<br>
+ casefoldKeys<br>
+ detect: [ :each |<br>
+ (self isUppercaseCode: each) and: [<br>
+ (newToCasefold at: each) = destinationCode ] ]<br>
+ ifFound: [ :uppercaseCode |<br>
+ newToUpper at: sourceCode put: uppercaseCode ]<br>
+ ifNone: [ ] ] ].<br>
+<br>
+ "Compact the dictionaries."<br>
+ newToCasefold compact.<br>
+ newToUpper compact.<br>
+ newToLower compact.<br>
+ "Save in an atomic operation."<br>
+ ToCasefold := newToCasefold.<br>
+ ToUpper := newToUpper.<br>
+ ToLower := newToLower<br>
+ !<br>
- ToCasefold keysAndValuesDo:<br>
- [:k :v |<br>
- (self isUppercaseCode: k)<br>
- ifTrue:<br>
- ["In most cases, uppercase letter are folded to lower case"<br>
- ToUpper at: v put: k.<br>
- ToLower at: k put: v].<br>
- (self isLowercaseCode: k)<br>
- ifTrue:<br>
- ["In a few cases, two upper case letters are folded to the same lower case.<br>
- We must find an upper case letter folded to the same letter"<br>
- | up |<br>
- up := ToCasefold keys detect: [:e | (self isUppercaseCode: e) and: [(ToCasefold at: e) = v]] ifNone: [nil].<br>
- up ifNotNil: [ToUpper at: k put: up]]].!<br>
<br>
<br>
</blockquote></div><br></div></div>