[squeak-dev] The Trunk: Multilingual-ul.208.mcz
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Sun May 3 01:29:14 UTC 2015
Ouch, yes, extracting simple case mapping from full CaseFolding data was
probably a mistake...
Thanks for reviewing, and as we say, vieux motard que jamais (better late
than never) - it's almost 5 years old
Next job will be to comment Unicode class, and explain which unicode
operation is supported...
--------------------------
Multilingual-nice.123
Author: nice
Time: 14 July 2010, 1:17:02.219 pm
UUID: ec8f05b8-78a6-4496-aca9-8f9b2e54823d
Ancestors: Multilingual-ul.122
1) simplify a case of at:ifAbsentPut: pattern in SparseXTable
2) provide a simple mapping of unicode upper/lower case characters as
described at http://unicode.org/reports/tr21/tr21-5.html
Note 1: Unicode class now provides two utilities to transform case of a
String rather than of a Character. This is for enabling future enhancements
like handling special casings when case folding does change the number of
characters.
Note 2: there is no automatic initialization performed yet. You'll have to
execute this before using above utilities:
Unicode initializeCaseMappings.
This is only an unoptimized, first attempt proposal. Comments and changes
are of course welcome.
2015-05-03 2:15 GMT+02:00 <commits at source.squeak.org>:
> Levente Uzonyi uploaded a new version of Multilingual to project The Trunk:
> http://source.squeak.org/trunk/Multilingual-ul.208.mcz
>
> ==================== Summary ====================
>
> Name: Multilingual-ul.208
> Author: ul
> Time: 1 May 2015, 3:25:18.828 pm
> UUID: 82d19dac-c602-4c0d-bc9a-7858e3a3c283
> Ancestors: Multilingual-ul.206
>
> Improved Unicode caseMappings:
> - Don't overwrite an existing mapping, because that leads to problems
> (like (Unicode toUppercaseCode: $k asciiValue) = 8490)
> - Use PluggableDictionary class >> #integerDictionary for better lookup
> performance (~+16%), and compaction resistance (done at every release).
> - Compact the dictionaries before saving.
> - Save the new dictionaries atomically.
>
> =============== Diff against Multilingual-ul.206 ===============
>
> Item was changed:
> ----- Method: Unicode class>>initializeCaseMappings (in category
> 'casing') -----
> initializeCaseMappings
> "Unicode initializeCaseMappings"
> +
> + UIManager default informUserDuring: [ :bar |
> - ToCasefold := IdentityDictionary new.
> - ToUpper := IdentityDictionary new.
> - ToLower := IdentityDictionary new.
> - UIManager default informUserDuring: [:bar|
> | stream |
> bar value: 'Downloading Unicode data'.
> stream := HTTPClient httpGet: '
> http://www.unicode.org/Public/UNIDATA/CaseFolding.txt'.
> (stream isKindOf: RWBinaryOrTextStream) ifFalse:[^self
> error: 'Download failed'].
> stream reset.
> bar value: 'Updating Case Mappings'.
> + self parseCaseMappingFrom: stream ].!
> - self parseCaseMappingFrom: stream.
> - ].!
>
> Item was changed:
> ----- Method: Unicode class>>parseCaseMappingFrom: (in category
> 'casing') -----
> parseCaseMappingFrom: stream
> "Parse the Unicode casing mappings from the given stream.
> Handle only the simple mappings"
> "
> Unicode initializeCaseMappings.
> "
>
> + | newToCasefold newToUpper newToLower casefoldKeys |
> + newToCasefold := PluggableDictionary integerDictionary.
> + newToUpper := PluggableDictionary integerDictionary.
> + newToLower := PluggableDictionary integerDictionary.
> - ToCasefold := IdentityDictionary new: 2048.
> - ToUpper := IdentityDictionary new: 2048.
> - ToLower := IdentityDictionary new: 2048.
>
> + "Filter the mappings (Simple and Common) to newToCasefold."
> + stream contents linesDo: [ :line |
> + | data fields sourceCode destinationCode |
> + data := line copyUpTo: $#.
> + fields := data findTokens: '; '.
> + (fields size > 2 and: [ #('C' 'S') includes: (fields at:
> 2) ]) ifTrue:[
> + sourceCode := Integer readFrom: (fields at: 1)
> base: 16.
> + destinationCode := Integer readFrom: (fields at:
> 3) base: 16.
> + newToCasefold at: sourceCode put: destinationCode
> ] ].
> - [stream atEnd] whileFalse:[
> - | fields line srcCode dstCode |
> - line := stream nextLine copyUpTo: $#.
> - fields := line withBlanksTrimmed findTokens: $;.
> - (fields size > 2 and: [#('C' 'S') includes: (fields at: 2)
> withBlanksTrimmed]) ifTrue:[
> - srcCode := Integer readFrom: (fields at: 1)
> withBlanksTrimmed base: 16.
> - dstCode := Integer readFrom: (fields at: 3)
> withBlanksTrimmed base: 16.
> - ToCasefold at: srcCode put: dstCode.
> - ].
> - ].
>
> + casefoldKeys := newToCasefold keys.
> + newToCasefold keysAndValuesDo: [ :sourceCode :destinationCode |
> + (self isUppercaseCode: sourceCode) ifTrue: [
> + "In most cases, uppercase letter are folded to
> lower case"
> + newToUpper at: destinationCode put: sourceCode.
> + newToLower at: sourceCode ifAbsentPut:
> destinationCode "Don't overwrite existing pairs. To avoid $k asUppercase to
> return the Kelvin character (8490)." ].
> + (self isLowercaseCode: sourceCode) ifTrue: [
> + "In a few cases, two upper case letters are folded
> to the same lower case.
> + We must find an upper case letter folded to the
> same letter"
> + casefoldKeys
> + detect: [ :each |
> + (self isUppercaseCode: each) and: [
> + (newToCasefold at: each) =
> destinationCode ] ]
> + ifFound: [ :uppercaseCode |
> + newToUpper at: sourceCode put:
> uppercaseCode ]
> + ifNone: [ ] ] ].
> +
> + "Compact the dictionaries."
> + newToCasefold compact.
> + newToUpper compact.
> + newToLower compact.
> + "Save in an atomic operation."
> + ToCasefold := newToCasefold.
> + ToUpper := newToUpper.
> + ToLower := newToLower
> + !
> - ToCasefold keysAndValuesDo:
> - [:k :v |
> - (self isUppercaseCode: k)
> - ifTrue:
> - ["In most cases, uppercase letter are
> folded to lower case"
> - ToUpper at: v put: k.
> - ToLower at: k put: v].
> - (self isLowercaseCode: k)
> - ifTrue:
> - ["In a few cases, two upper case letters
> are folded to the same lower case.
> - We must find an upper case letter folded
> to the same letter"
> - | up |
> - up := ToCasefold keys detect: [:e | (self
> isUppercaseCode: e) and: [(ToCasefold at: e) = v]] ifNone: [nil].
> - up ifNotNil: [ToUpper at: k put: up]]].!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150503/91226eab/attachment.htm
More information about the Squeak-dev
mailing list
|