[squeak-dev] The Inbox: Collections-mt.838.mcz

Marcel Taeumel marcel.taeumel at hpi.de
Thu Jul 4 14:56:05 UTC 2019


Credits go to Toni Mattis (https://github.com/amintos) for the idea and implementation! :-)

Thanks!
Am 04.07.2019 16:32:51 schrieb commits at source.squeak.org <commits at source.squeak.org>:
A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-mt.838.mcz

==================== Summary ====================

Name: Collections-mt.838
Author: mt
Time: 4 July 2019, 4:32:40.854026 pm
UUID: ac4ab442-79c0-d246-8dec-914be7ee5356
Ancestors: Collections-pre.837

To String, adds simple analysis of natural language in source code. No word stemming.

1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).
2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.

Try this:

HTTPDownloadRequest name findFeatures.
(Morph >> #drawOn:) getSource asString findFeatures.

Where can that be useful?

- Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.
- Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.

=============== Diff against Collections-pre.837 ===============

Item was added:
+ ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----
+ findFeatureIndicesDo: aBlock
+ "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"
+ | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op" |
+
+ state := 0.
+ last := 1.
+
+ 1 to: self size do: [ :index |
+ char := self at: index.
+ "a"
+ char isLowercase ifTrue: [
+ (state a"
+ (state == 3) ifTrue: [
+ "AAa -> A + Aa (camel case follows uppercase)"
+ aBlock value: last value: index - 2.
+ last := index - 1.
+ state := 2].
+ (state > 3) ifTrue: [
+ "+a -> + | a (letter follows non-letter)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 1]]
+ ifFalse: [
+ char isUppercase ifTrue: [
+ (state == 0)
+ ifTrue: [state := 2] "start -> A"
+ ifFalse: [
+ (state 3]) ifTrue: [
+ "*A -> * | A (uppercase begins, flush before)"
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 2] ifFalse: [
+ "AA -> AA (uppercase continues)"
+ state := 3]]]
+ ifFalse: [
+ ("char == $: or:" char isSeparator) ifTrue: [
+ "skip colon/whitespace"
+ (state > 0) ifTrue: [
+ aBlock value: last value: index - 1.
+ state := 0].
+ last := index + 1]
+ ifFalse: [
+ char isDigit ifTrue: [
+ (state == 0)
+ ifTrue: [state := 4]
+ ifFalse: [
+ (state ~= 4) ifTrue: [
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 4]]]
+ ifFalse: [
+ (state == 0)
+ ifTrue: [state := 5]
+ ifFalse: [
+ (state
+ aBlock value: last value: index - 1.
+ last := index.
+ state := 5]]]]]]].
+ last
+ aBlock value: last value: self size]!

Item was added:
+ ----- Method: String>>findFeatures (in category 'accessing - features') -----
+ findFeatures
+
+ ^ Array streamContents: [:features |
+ self findFeaturesDo: [:feature | features nextPut: feature]]!

Item was added:
+ ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----
+ findFeaturesDo: aBlock
+ "Simple analysis for natural language in source code. No support for word stemming."
+
+ self findFeatureIndicesDo: [:start :end |
+ (self at: start) isLetter ifTrue: [
+ aBlock value: (self copyFrom: start to: end) asLowercase]].!

Item was changed:
----- Method: String>>findTokens: (in category 'accessing') -----
findTokens: delimiters
+ "Answer the collection of tokens that result from parsing self."
+
+ ^ OrderedCollection streamContents: [:tokens |
+ self
+ findTokens: delimiters
+ do: [:token | tokens nextPut: token]]!
- "Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character."
-
- | tokens keyStart keyStop separators |
-
- tokens := OrderedCollection new.
- separators := delimiters isCharacter
- ifTrue: [Array with: delimiters]
- ifFalse: [delimiters].
- keyStop := 1.
- [keyStop
- [keyStart := self skipDelimiters: separators startingAt: keyStop.
- keyStop := self findDelimiters: separators startingAt: keyStart.
- keyStart
- ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].
- ^tokens!

Item was added:
+ ----- Method: String>>findTokens:do: (in category 'accessing') -----
+ findTokens: delimiters do: aBlock
+
+ self
+ findTokens: delimiters
+ indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!

Item was added:
+ ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----
+ findTokens: delimiters indicesDo: aBlock
+ "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."
+
+ | tokens keyStart keyStop separators |
+ separators := delimiters isCharacter
+ ifTrue: [Array with: delimiters]
+ ifFalse: [delimiters].
+ keyStop := 1.
+ [keyStop
+ keyStart := self skipDelimiters: separators startingAt: keyStop.
+ keyStop := self findDelimiters: separators startingAt: keyStart.
+ keyStart
+ ifTrue: [aBlock value: keyStart value: keyStop - 1]].!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190704/016cccd6/attachment.html>


More information about the Squeak-dev mailing list