[squeak-dev] The Trunk: Collections-mt.838.mcz

commits at source.squeak.org commits at source.squeak.org
Thu Jul 11 05:20:54 UTC 2019


Marcel Taeumel uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-mt.838.mcz

==================== Summary ====================

Name: Collections-mt.838
Author: mt
Time: 4 July 2019, 4:32:40.854026 pm
UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
Ancestors: Collections-pre.837

To String, adds simple analysis of natural language in source code. No word stemming.

1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.

Try this:

HTTPDownloadRequest name findFeatures.
(Morph >> #drawOn:) getSource asString findFeatures.

Where can that be useful?

- Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
- Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

=============== Diff against Collections-pre.837 ===============

Item was added:
+ ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
+ findFeatureIndicesDo: aBlock
+ 	"State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"
+ 	| last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op"  |
+ 	
+ 	state := 0.
+ 	last := 1.
+ 	
+ 	1 to: self size do: [ :index |
+ 		char := self at: index.
+ 		"a"
+ 		char isLowercase ifTrue: [
+ 			(state < 3) ifTrue: [state := 1]. "*a -> a"
+ 			(state == 3) ifTrue: [
+ 				"AAa -> A + Aa (camel case follows uppercase)"
+ 				aBlock value: last value: index - 2.
+ 				last := index - 1.
+ 				state := 2].
+ 			(state > 3) ifTrue: [
+ 				"+a -> + | a (letter follows non-letter)" 
+ 				aBlock value: last value: index - 1.
+ 				last := index.
+ 				state := 1]] 
+ 		ifFalse: [
+ 			char isUppercase ifTrue: [
+ 				(state == 0)
+ 					ifTrue: [state := 2] "start -> A"
+ 					ifFalse: [
+ 						(state < 2 or: [state > 3]) ifTrue: [
+ 							"*A -> * | A (uppercase begins, flush before)"
+ 							aBlock value: last value: index - 1.
+ 							last := index.
+ 							state := 2] ifFalse: [
+ 								"AA -> AA (uppercase continues)"
+ 								state := 3]]]
+ 			ifFalse: [
+ 				("char == $: or:" char isSeparator) ifTrue: [
+ 					"skip colon/whitespace"
+ 					(state > 0) ifTrue: [
+ 						aBlock value: last value: index - 1.
+ 						state := 0].
+ 					last := index + 1]
+ 				ifFalse: [
+ 					char isDigit ifTrue: [
+ 						(state == 0)
+ 							ifTrue: [state := 4]
+ 							ifFalse: [
+ 							(state ~= 4) ifTrue: [
+ 								aBlock value: last value: index - 1.
+ 								last := index.
+ 								state := 4]]]
+ 						ifFalse: [
+ 							(state == 0)
+ 								ifTrue: [state := 5]
+ 								ifFalse: [
+ 									(state < 5) ifTrue: [
+ 										aBlock value: last value: index - 1.
+ 										last := index.
+ 										state := 5]]]]]]].
+ 	last <= self size ifTrue: [
+ 		aBlock value: last value: self size]!

Item was added:
+ ----- Method: String>>findFeatures (in category 'accessing - features') -----
+ findFeatures
+ 	
+ 	^ Array streamContents: [:features |
+ 		self findFeaturesDo: [:feature | features nextPut: feature]]!

Item was added:
+ ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
+ findFeaturesDo: aBlock
+ 	"Simple analysis for natural language in source code. No support for word stemming."
+ 
+ 	self findFeatureIndicesDo: [:start :end |
+ 		(self at: start) isLetter ifTrue: [
+ 			aBlock value: (self copyFrom: start to: end) asLowercase]].!

Item was changed:
  ----- Method: String>>findTokens: (in category 'accessing') -----
  findTokens: delimiters
+ 	"Answer the collection of tokens that result from parsing self."
+ 	
+ 	^ OrderedCollection streamContents: [:tokens |
+ 		self
+ 			findTokens: delimiters
+ 			do: [:token | tokens nextPut: token]]!
- 	"Answer the collection of tokens that result from parsing self.  Return strings between the delimiters.  Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character."
- 
- 	| tokens keyStart keyStop separators |
- 
- 	tokens := OrderedCollection new.
- 	separators := delimiters isCharacter 
- 		ifTrue: [Array with: delimiters]
- 		ifFalse: [delimiters].
- 	keyStop := 1.
- 	[keyStop <= self size] whileTrue:
- 		[keyStart := self skipDelimiters: separators startingAt: keyStop.
- 		keyStop := self findDelimiters: separators startingAt: keyStart.
- 		keyStart < keyStop
- 			ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
- 	^tokens!

Item was added:
+ ----- Method: String>>findTokens:do: (in category 'accessing') -----
+ findTokens: delimiters do: aBlock
+ 	
+ 	self
+ 		findTokens: delimiters
+ 		indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!

Item was added:
+ ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
+ findTokens: delimiters indicesDo: aBlock
+ 	"Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border.  Several delimiters in a row are considered as just one separation.  Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
+ 	
+ 	| tokens keyStart keyStop separators |
+ 	separators := delimiters isCharacter 
+ 		ifTrue: [Array with: delimiters]
+ 		ifFalse: [delimiters].
+ 	keyStop := 1.
+ 	[keyStop <= self size] whileTrue: [
+ 		keyStart := self skipDelimiters: separators startingAt: keyStop.
+ 		keyStop := self findDelimiters: separators startingAt: keyStart.
+ 		keyStart < keyStop
+ 			ifTrue: [aBlock value: keyStart value: keyStop - 1]].!



More information about the Squeak-dev mailing list