[squeak-dev] The Inbox: Collections-cmm.576.mcz
Levente Uzonyi
leves at elte.hu
Thu Jul 24 21:16:14 UTC 2014
On Thu, 24 Jul 2014, commits at source.squeak.org wrote:
> Chris Muller uploaded a new version of Collections to project The Inbox:
> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>
> ==================== Summary ====================
>
> Name: Collections-cmm.576
> Author: cmm
> Time: 24 July 2014, 10:39:06.452 am
> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
> Ancestors: Collections-eem.575
>
> When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
There are two problems with this change
- it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).
- it introduces unwanted behavior. What if I want the index of the last matching element?
If you want the index of the first matching element, then you can do two
things:
- find the index after the binary search yourself (which is still not a
good idea because of the possible performance loss)
- change the binary search to behave as you'd like to
| haystack needle |
haystack := #(1 3 5 7 11 11 15 23).
needle := 11.
haystack
findBinaryIndex: [ :each |
needle = each
ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ]
ifFalse: [ needle - each ] ]
do: nil "We'll never get here, so it can be anything."
ifNone: [ :lower :upper |
" upper is the index of the leftmost match if there's a match, and its value is at least 1 "
(upper <= haystack size and: [ (haystack at: upper) = needle ])
ifTrue: [ upper ]
ifFalse: [ 0 ] ]
With a slight modification you can find the index of the last match too.
Levente
[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations
[2] Here's a small benchmark demonstrating the edge case with your change:
| haystack needle |
Smalltalk garbageCollect.
haystack := Array new: 100000 withAll: 11.
needle := 11.
{ [ haystack
findBinaryIndex: [ :each |
needle = each
ifTrue: [ -1 ]
ifFalse: [ needle - each ] ]
do: nil
ifNone: [ :lower :upper |
(upper <= haystack size and: [ (haystack at: upper) = needle ])
ifTrue: [ upper ]
ifFalse: [ 0 ] ] ] bench.
[ haystack
findBinaryIndex: [ :each | needle - each ]
do: [ :each | each ]
ifNone: [ 0 ] ] bench }
#('1,150,000 per second.' '741 per second.')
>
> =============== Diff against Collections-eem.575 ===============
>
> Item was changed:
> ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
> + findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
> - findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
> "Search for an element in the receiver using binary search.
> The argument aBlock is a one-element block returning
> 0 - if the element is the one searched for
> <0 - if the search should continue in the first half
> >0 - if the search should continue in the second half
> If found, evaluate actionBlock with the index as argument
> If no matching element is found, evaluate exceptionBlock,
> with the indexes of the 'bounding' elements as optional
> arguments. Warning: Might give invalid indexes, see
> examples below.
> Examples:
> + #(1 3 5 7 11 11 15 23)
> - #(1 3 5 7 11 15 23)
> findBinaryIndex: [ :arg | 11 - arg ]
> do: [ :found | found ]
> ifNone: [ :a :b | ('between: ', {a. b} printString)]
> #(1 3 5 7 11 15 23)
> findBinaryIndex: [ :arg | 12 - arg ]
> do: [ :found | found ]
> ifNone: [ :a :b | ('between: ', {a. b} printString) ]
> #(1 3 5 7 11 15 23) d
> findBinaryIndex: [ :arg | 0.5 - arg ]
> do: [ :found | found ]
> ifNone: [ :a :b | ('between: ', {a. b} printString) ]
> #(1 3 5 7 11 15 23)
> findBinaryIndex: [ :arg | 25 - arg ]
> do: [ :found | found ]
> ifNone: [ :a :b | ('between: ',{a. b} printString) ]
> "
> | index low high |
> low := 1.
> high := self size.
> + [ index := high + low // 2.
> + low > high ] whileFalse:
> + [ | test |
> + test := aBlock value: (self at: index).
> + test = 0
> + ifTrue:
> + [ "In case there are duplicate keys, walk backward to the first."
> + [ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
> + ^ actionBlock value: index ]
> + ifFalse:
> + [ test > 0
> - [
> - index := high + low // 2.
> - low > high ] whileFalse: [
> - | test |
> - test := aBlock value: (self at: index).
> - test = 0
> - ifTrue: [ ^actionBlock value: index ]
> - ifFalse: [ test > 0
> ifTrue: [ low := index + 1 ]
> ifFalse: [ high := index - 1 ] ] ].
> + ^ exceptionBlock
> + cull: high
> + cull: low!
> - ^exceptionBlock cull: high cull: low!
>
>
>
More information about the Squeak-dev
mailing list
|