[squeak-dev] The Trunk: Collections-topa.806.mcz

Tobias Pape Das.Linux at gmx.de
Thu Sep 13 15:11:31 UTC 2018


> On 13.09.2018, at 16:35, Levente Uzonyi <leves at caesar.elte.hu> wrote:
> 
> You're opening a can of worms with this. There are several other separator/white space characters missing from that list.

Yeah, thats listed below in a comment. I am hesitating to add the other because WideString, so I just put them in a comment.

> Also, this change makes the various #*separator* implementations (e.g. #isSeparator) inconsistent, so I strongly disagree with this change.

Hmm. But isSeparator is Wrong, then… because nbsp _is_ a separator, right?
See the discussion with Ron.
On a related note, is a very fast #isSeparator important?
Otherwise I'd just propose 

	^ #( 9 10 12 13 32 160 ) includes: self asInteger
for now…

All other *separator* messages fall back either to either Character>>#isSeparator or #separators from CharacterSet, which in turn is based on Character class>>#separators.



> 
> Levente
> 
> On Wed, 12 Sep 2018, commits at source.squeak.org wrote:
> 
>> Tobias Pape uploaded a new version of Collections to project The Trunk:
>> http://source.squeak.org/trunk/Collections-topa.806.mcz
>> 
>> ==================== Summary ====================
>> 
>> Name: Collections-topa.806
>> Author: topa
>> Time: 12 September 2018, 3:28:40.687052 pm
>> UUID: 46b95db5-a773-4113-92f0-5ee905404b49
>> Ancestors: Collections-cmm.805
>> 
>> Fix separators to include U+00A0 (no break space)
>> 
>> Thanks Ron!
>> 
>> =============== Diff against Collections-cmm.805 ===============
>> 
>> Item was changed:
>> ----- Method: Character class>>separators (in category 'instance creation') -----
>> separators
>> + 	"Answer a collection of space-like separator characters.
>> + 	Note that we do not consider spaces in >8bit code points yet.
>> + 	"
>> - 	"Answer a collection of the standard ASCII separator characters."
>> + 	^ #(9 "tab"
>> - 	^ #(32 "space"
>> - 		13 "cr"
>> - 		9 "tab"
>> 		10 "line feed"
>> + 		12 "form feed"
>> + 		13 "cr"
>> + 		32 "space"
>> + 		160 "non-breaking space, see Unicode Z general category")
>> + 		collect: [:v | Character value: v] as: String
>> + " To be considered:
>> + 16r1680 OGHAM SPACE MARK
>> + 16r2000 EN QUAD
>> + 16r2001 EM QUAD
>> + 16r2002 EN SPACE
>> + 16r2003 EM SPACE
>> + 16r2004 THREE-PER-EM SPACE
>> + 16r2005 FOUR-PER-EM SPACE
>> + 16r2006 SIX-PER-EM SPACE
>> + 16r2007 FIGURE SPACE
>> + 16r2008 PUNCTUATION SPACE
>> + 16r2009 THIN SPACE
>> + 16r200A HAIR SPACE
>> + 16r2028 LINE SEPARATOR
>> + 16r2029 PARAGRAPH SEPARATOR
>> + 16r202F NARROW NO-BREAK SPACE
>> + 16r205F MEDIUM MATHEMATICAL SPACE
>> + 16r3000 IDEOGRAPHIC SPACE
>> + "!
>> - 		12 "form feed")
>> - 		collect: [:v | Character value: v] as: String!
>> 
>> Item was changed:
>> + (PackageInfo named: 'Collections') postscript: 'CharacterSet cleanUp: false.'!
>> - (PackageInfo named: 'Collections') postscript: 'Character initializeClassificationTable'!
> 



More information about the Squeak-dev mailing list