On Thu, 24 Jul 2014, commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox: http://source.squeak.org/inbox/Collections-cmm.576.mcz
==================== Summary ====================
Name: Collections-cmm.576 Author: cmm Time: 24 July 2014, 10:39:06.452 am UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657 Ancestors: Collections-eem.575
When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
There are two problems with this change - it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard). - it introduces unwanted behavior. What if I want the index of the last matching element?
If you want the index of the first matching element, then you can do two things: - find the index after the binary search yourself (which is still not a good idea because of the possible performance loss) - change the binary search to behave as you'd like to
| haystack needle | haystack := #(1 3 5 7 11 11 15 23). needle := 11. haystack findBinaryIndex: [ :each | needle = each ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ] ifFalse: [ needle - each ] ] do: nil "We'll never get here, so it can be anything." ifNone: [ :lower :upper | " upper is the index of the leftmost match if there's a match, and its value is at least 1 " (upper <= haystack size and: [ (haystack at: upper) = needle ]) ifTrue: [ upper ] ifFalse: [ 0 ] ]
With a slight modification you can find the index of the last match too.
Levente
[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Land... [2] Here's a small benchmark demonstrating the edge case with your change:
| haystack needle | Smalltalk garbageCollect. haystack := Array new: 100000 withAll: 11. needle := 11. { [ haystack findBinaryIndex: [ :each | needle = each ifTrue: [ -1 ] ifFalse: [ needle - each ] ] do: nil ifNone: [ :lower :upper | (upper <= haystack size and: [ (haystack at: upper) = needle ]) ifTrue: [ upper ] ifFalse: [ 0 ] ] ] bench. [ haystack findBinaryIndex: [ :each | needle - each ] do: [ :each | each ] ifNone: [ 0 ] ] bench } #('1,150,000 per second.' '741 per second.')
=============== Diff against Collections-eem.575 ===============
Item was changed: ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
- findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
- findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock "Search for an element in the receiver using binary search. The argument aBlock is a one-element block returning 0 - if the element is the one searched for <0 - if the search should continue in the first half
0 - if the search should continue in the second halfIf found, evaluate actionBlock with the index as argument If no matching element is found, evaluate exceptionBlock, with the indexes of the 'bounding' elements as optional arguments. Warning: Might give invalid indexes, see examples below. Examples:
#(1 3 5 7 11 11 15 23)
#(1 3 5 7 11 15 23) findBinaryIndex: [ :arg | 12 - arg ] do: [ :found | found ] ifNone: [ :a :b | ('between: ', {a. b} printString) ] #(1 3 5 7 11 15 23) d findBinaryIndex: [ :arg | 0.5 - arg ] do: [ :found | found ] ifNone: [ :a :b | ('between: ', {a. b} printString) ] #(1 3 5 7 11 15 23) findBinaryIndex: [ :arg | 25 - arg ] do: [ :found | found ] ifNone: [ :a :b | ('between: ',{a. b} printString) ] " | index low high | low := 1. high := self size.#(1 3 5 7 11 15 23) findBinaryIndex: [ :arg | 11 - arg ] do: [ :found | found ] ifNone: [ :a :b | ('between: ', {a. b} printString)]
- [ index := high + low // 2.
- low > high ] whileFalse:
[ | test |
test := aBlock value: (self at: index).
test = 0
ifTrue:
[ "In case there are duplicate keys, walk backward to the first."
[ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
^ actionBlock value: index ]
ifFalse:
[ test > 0
- [
index := high + low // 2.
low > high ] whileFalse: [
| test |
test := aBlock value: (self at: index).
test = 0
ifTrue: [ ^actionBlock value: index ]
ifFalse: [ test > 0 ifTrue: [ low := index + 1 ] ifFalse: [ high := index - 1 ] ] ].
- ^ exceptionBlock
cull: high
cull: low!
- ^exceptionBlock cull: high cull: low!
On Thu, Jul 24, 2014 at 4:16 PM, Levente Uzonyi leves@elte.hu wrote:
On Thu, 24 Jul 2014, commits@source.squeak.org wrote:
Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.576.mcz
==================== Summary ====================
Name: Collections-cmm.576 Author: cmm Time: 24 July 2014, 10:39:06.452 am UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657 Ancestors: Collections-eem.575
When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
There are two problems with this change
- it breaks this method's runtime guarantee which is O(log(n)) for all
cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).
- it introduces unwanted behavior. What if I want the index of the last
matching element?
For me, it introduced the _wanted_ behavior. :) But yes, its very easy to imagine other cases where it wouldn't be needed and so the extra traversal would make it unwanted. Since what I need can be done on the outside, we don't need such unwantedness on the inside.
Incidentally, it seems like the current behavior might not always answer the last-matching element; like dividing odd sizes, remainders and one-off's, you might get 2nd from last matching or even one besides that? Boy, Franks words are ringing clear about tests for this.
If you want the index of the first matching element, then you can do two things:
- find the index after the binary search yourself (which is still not a
good idea because of the possible performance loss)
That's what I ended up doing. I know the composition of the data; very few duplicate keys. It's a perfect solution for me.
- change the binary search to behave as you'd like to
| haystack needle | haystack := #(1 3 5 7 11 11 15 23). needle := 11. haystack findBinaryIndex: [ :each | needle = each ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ] ifFalse: [ needle - each ] ] do: nil "We'll never get here, so it can be anything." ifNone: [ :lower :upper | " upper is the index of the leftmost match if there's a match, and its value is at least 1 " (upper <= haystack size and: [ (haystack at: upper) = needle ]) ifTrue: [ upper ] ifFalse: [ 0 ] ]
Ah, interesting..
With a slight modification you can find the index of the last match too.
Levente
[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_ Bachmann.E2.80.93Landau_notations [2] Here's a small benchmark demonstrating the edge case with your change:
| haystack needle | Smalltalk garbageCollect. haystack := Array new: 100000 withAll: 11. needle := 11. { [ haystack findBinaryIndex: [ :each | needle = each ifTrue: [ -1 ] ifFalse: [ needle - each ] ] do: nil ifNone: [ :lower :upper | (upper <= haystack size and: [ (haystack at: upper) = needle ]) ifTrue: [ upper ] ifFalse: [ 0 ] ] ] bench. [ haystack findBinaryIndex: [ :each | needle - each ] do: [ :each | each ] ifNone: [ 0 ] ] bench } #('1,150,000 per second.' '741 per second.')
LOL!!!!! Good one! Oh, okay, yes, you did call that an "edge" case. More like a "nil" case, but it sure made for a dramatic comparison! :-D
squeak-dev@lists.squeakfoundation.org