<body><div id="__MailbirdStyleContent" style="font-size: 10pt;font-family: Arial;color: #000000">
Hi Levente,<div><br></div><div>thanks for the tips! I would appreciate another review of Collections-mt.839 and CollectionsTests-mt.313. :-)</div><div><br></div><div><a href="http://forum.world.st/The-Inbox-Collections-mt-839-mcz-tp5100939.html">http://forum.world.st/The-Inbox-Collections-mt-839-mcz-tp5100939.html</a></div><div><a href="http://forum.world.st/The-Inbox-CollectionsTests-mt-313-mcz-tp5100940.html">http://forum.world.st/The-Inbox-CollectionsTests-mt-313-mcz-tp5100940.html</a></div><div><br></div><div>Best,</div><div>Marcel</div><div><br></div><div class="mb_sig"></div><blockquote class="history_container" type="cite" style="border-left-style:solid;border-width:1px; margin-top:20px; margin-left:0px;padding-left:10px;">
<p style="color: #AAAAAA; margin-top: 10px;">Am 05.07.2019 00:44:39 schrieb Levente Uzonyi <leves@caesar.elte.hu>:</p><div style="font-family:Arial,Helvetica,sans-serif">On Thu, 4 Jul 2019, commits@source.squeak.org wrote:<br><br>> A new version of Collections was added to project The Inbox:<br>> http://source.squeak.org/inbox/Collections-mt.838.mcz<br>><br>> ==================== Summary ====================<br>><br>> Name: Collections-mt.838<br>> Author: mt<br>> Time: 4 July 2019, 4:32:40.854026 pm<br>> UUID: ac4ab442-79c0-d246-8dec-914be7ee5356<br>> Ancestors: Collections-pre.837<br>><br>> To String, adds simple analysis of natural language in source code. No word stemming.<br>><br>> 1) Refactor #findTokens: to look like #lines (i.e. #linesDo: and #lineIndicesDo:).<br>> 2) Add #findFeaturesDo: like #findTokens:do: and #linesDo:.<br>><br>> Try this:<br>><br>> HTTPDownloadRequest name findFeatures.<br>> (Morph >> #drawOn:) getSource asString findFeatures.<br>><br>> Where can that be useful?<br>><br>> - Automatic insertion of "*" for search terms like "WeakDictionary" to also find WeakIdentityDictionary etc.<br>> - Prefix emphasis for names lists of classes in code browsers: MCAddition, MCAncestry, etc.<br><br>Given the new methods' completixy, I think they deserve tests.<br><br>><br>> =============== Diff against Collections-pre.837 ===============<br>><br>> Item was added:<br>> + ----- Method: String>>findFeatureIndicesDo: (in category 'accessing - features') -----<br>> + findFeatureIndicesDo: aBlock<br>> + "State machine that separates camelCase, UPPERCase, number/operator combinations and skips colons"<br><br>I think an example would help make it is easier to understand what this <br>method does. (The same applies to #findTokens:, but I'm already familiar <br>with that.)<br><br>> + | last state char "0 = start, 1 = a, 2 = A, 3 = AA, 4 = num, 5 = op" |<br>> + <br>> + state := 0.<br>> + last := 1.<br>> + <br>> + 1 to: self size do: [ :index |<br>> + char := self at: index.<br>> + "a"<br>> + char isLowercase ifTrue: [<br>> + (state < 3)="" iftrue:="" [state="" :="1]." "*a="" -=""> a"<br>> + (state == 3) ifTrue: [<br><br>#= is optimized just as good as #== when the argument is a constant. Using <br>#= and dropping the unnecessary parentheses would make the code look a bit <br>less "C-style".<br><br>> + "AAa -> A + Aa (camel case follows uppercase)"<br>> + aBlock value: last value: index - 2.<br>> + last := index - 1.<br>> + state := 2].<br>> + (state > 3) ifTrue: [<br>> + "+a -> + | a (letter follows non-letter)" <br>> + aBlock value: last value: index - 1.<br>> + last := index.<br>> + state := 1]] <br>> + ifFalse: [<br>> + char isUppercase ifTrue: [<br>> + (state == 0)<br>> + ifTrue: [state := 2] "start -> A"<br>> + ifFalse: [<br>> + (state < 2="" or:="" [state=""> 3]) ifTrue: [<br>> + "*A -> * | A (uppercase begins, flush before)"<br>> + aBlock value: last value: index - 1.<br>> + last := index.<br>> + state := 2] ifFalse: [<br>> + "AA -> AA (uppercase continues)"<br>> + state := 3]]]<br>> + ifFalse: [<br>> + ("char == $: or:" char isSeparator) ifTrue: [<br>> + "skip colon/whitespace"<br>> + (state > 0) ifTrue: [<br>> + aBlock value: last value: index - 1.<br>> + state := 0].<br>> + last := index + 1]<br>> + ifFalse: [<br>> + char isDigit ifTrue: [<br>> + (state == 0)<br>> + ifTrue: [state := 4]<br>> + ifFalse: [<br>> + (state ~= 4) ifTrue: [<br>> + aBlock value: last value: index - 1.<br>> + last := index.<br>> + state := 4]]]<br>> + ifFalse: [<br>> + (state == 0)<br>> + ifTrue: [state := 5]<br>> + ifFalse: [<br>> + (state < 5)="" iftrue:=""><br>> + aBlock value: last value: index - 1.<br>> + last := index.<br>> + state := 5]]]]]]].<br>> + last <= self="" size="" iftrue:=""></=><br>> + aBlock value: last value: self size]!<br>><br>> Item was added:<br>> + ----- Method: String>>findFeatures (in category 'accessing - features') -----<br>> + findFeatures<br>> + <br>> + ^ Array streamContents: [:features |<br>> + self findFeaturesDo: [:feature | features nextPut: feature]]!<br>><br>> Item was added:<br>> + ----- Method: String>>findFeaturesDo: (in category 'accessing - features') -----<br>> + findFeaturesDo: aBlock<br>> + "Simple analysis for natural language in source code. No support for word stemming."<br>> + <br>> + self findFeatureIndicesDo: [:start :end |<br>> + (self at: start) isLetter ifTrue: [<br>> + aBlock value: (self copyFrom: start to: end) asLowercase]].!<br>><br>> Item was changed:<br>> ----- Method: String>>findTokens: (in category 'accessing') -----<br>> findTokens: delimiters<br>> + "Answer the collection of tokens that result from parsing self."<br>> + <br>> + ^ OrderedCollection streamContents: [:tokens |<br><br>#streamContents: should never be used with OrderedCollection.<br>OrderedCollection has its own streaming API (I would use #addLast: here) <br>which is way more efficient.<br><br>> + self<br>> + findTokens: delimiters<br>> + do: [:token | tokens nextPut: token]]!<br>> - "Answer the collection of tokens that result from parsing self. Return strings between the delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character."<br>> - <br>> - | tokens keyStart keyStop separators |<br>> - <br>> - tokens := OrderedCollection new.<br>> - separators := delimiters isCharacter <br>> - ifTrue: [Array with: delimiters]<br>> - ifFalse: [delimiters].<br>> - keyStop := 1.<br>> - [keyStop <= self="" size]=""></=><br>> - [keyStart := self skipDelimiters: separators startingAt: keyStop.<br>> - keyStop := self findDelimiters: separators startingAt: keyStart.<br>> - keyStart <><br>> - ifTrue: [tokens add: (self copyFrom: keyStart to: (keyStop - 1))]].<br>> - ^tokens!<br>><br>> Item was added:<br>> + ----- Method: String>>findTokens:do: (in category 'accessing') -----<br>> + findTokens: delimiters do: aBlock<br>> + <br>> + self<br>> + findTokens: delimiters<br>> + indicesDo: [:start :end | aBlock value: (self copyFrom: start to: end)].!<br>><br>> Item was added:<br>> + ----- Method: String>>findTokens:indicesDo: (in category 'accessing') -----<br>> + findTokens: delimiters indicesDo: aBlock<br>> + "Parse self to find tokens between delimiters. Any character in the Collection delimiters marks a border. Several delimiters in a row are considered as just one separation. Also, allow delimiters to be a single character. Similar to #lineIndicesDo:."<br>> + <br>> + | tokens keyStart keyStop separators |<br><br>There are a few opportunities to regain the performance lost with the <br>introduction of blocks and sends:<br>- the tokens temporary is unused<br>- self size should be cached in a temporary (size)<br>- instead of Array >> #with:, a brace array should be used<br>- | keyStop keyStart separators size | would probably yield the best <br>performance<br><br>Levente<br><br>> + separators := delimiters isCharacter <br>> + ifTrue: [Array with: delimiters]<br>> + ifFalse: [delimiters].<br>> + keyStop := 1.<br>> + [keyStop <= self="" size]="" whiletrue:=""></=><br>> + keyStart := self skipDelimiters: separators startingAt: keyStop.<br>> + keyStop := self findDelimiters: separators startingAt: keyStart.<br>> + keyStart <><br>> + ifTrue: [aBlock value: keyStart value: keyStop - 1]].!<br><br></div></blockquote>
</div></body>