[squeak-dev] The Inbox: Collections-cmm.576.mcz

Levente Uzonyi leves at elte.hu
Thu Jul 24 21:16:14 UTC 2014


On Thu, 24 Jul 2014, commits at source.squeak.org wrote:

> Chris Muller uploaded a new version of Collections to project The Inbox:
> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>
> ==================== Summary ====================
>
> Name: Collections-cmm.576
> Author: cmm
> Time: 24 July 2014, 10:39:06.452 am
> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
> Ancestors: Collections-eem.575
>
> When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.

There are two problems with this change
- it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).
- it introduces unwanted behavior. What if I want the index of the last matching element?

If you want the index of the first matching element, then you can do two 
things:
- find the index after the binary search yourself (which is still not a 
good idea because of the possible performance loss)
- change the binary search to behave as you'd like to

| haystack needle |
haystack := #(1 3 5 7 11 11 15 23).
needle := 11.
haystack
 	findBinaryIndex: [ :each |
 		needle = each
 			ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ]
 			ifFalse: [ needle - each ] ]
 	do: nil "We'll never get here, so it can be anything."
 	ifNone:	[ :lower :upper |
 		" upper is the index of the leftmost match if there's a match, and its value is at least 1 "
 		(upper <= haystack size and: [ (haystack at: upper) = needle ])
 			ifTrue: [ upper ]
 			ifFalse: [ 0 ] ]

With a slight modification you can find the index of the last match too.


Levente

[1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations
[2] Here's a small benchmark demonstrating the edge case with your change:

| haystack needle |
Smalltalk garbageCollect.
haystack := Array new: 100000 withAll: 11.
needle := 11.
{ [ haystack
 	findBinaryIndex: [ :each |
 		needle = each
 			ifTrue: [ -1 ]
 			ifFalse: [ needle - each ] ]
 	do: nil
 	ifNone:	[ :lower :upper |
 		(upper <= haystack size and: [ (haystack at: upper) = needle ])
 			ifTrue: [ upper ]
 			ifFalse: [ 0 ] ] ] bench.
[ haystack
 	findBinaryIndex: [ :each | needle - each ]
 	do: [ :each | each ]
 	ifNone: [ 0 ] ] bench }
#('1,150,000 per second.' '741 per second.')

>
> =============== Diff against Collections-eem.575 ===============
>
> Item was changed:
>  ----- Method: SequenceableCollection>>findBinaryIndex:do:ifNone: (in category 'enumerating') -----
> + findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
> - findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
>  	"Search for an element in the receiver using binary search.
>  	The argument aBlock is a one-element block returning
>  		0 	- if the element is the one searched for
>  		<0	- if the search should continue in the first half
>  		>0	- if the search should continue in the second half
>  	If found, evaluate actionBlock with the index as argument
>  	If no matching element is found, evaluate exceptionBlock,
>  	with the indexes of the 'bounding' elements as optional
>  	arguments. 	Warning: Might give invalid indexes, see
>  	examples below.
>  	Examples:
> + 		#(1 3 5 7 11 11 15 23)
> - 		#(1 3 5 7 11 15 23)
>  			findBinaryIndex: [ :arg | 11 - arg ]
>  			do: [ :found | found ]
>  			ifNone: [ :a :b | ('between: ', {a. b} printString)]
>  		#(1 3 5 7 11 15 23)
>  			findBinaryIndex: [ :arg | 12 - arg ]
>  			do: [ :found | found ]
>  			ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>  		#(1 3 5 7 11 15 23) d
>  			findBinaryIndex: [ :arg | 0.5 - arg ]
>  			do: [ :found | found ]
>  			ifNone: [ :a :b | ('between: ', {a. b} printString) ]
>  		#(1 3 5 7 11 15 23)
>  			findBinaryIndex: [ :arg | 25 - arg ]
>  			do: [ :found | found ]
>  			ifNone: [ :a :b | ('between: ',{a. b} printString) ]
>  	"
>  	| index low high |
>  	low := 1.
>  	high := self size.
> + 	[ index := high + low // 2.
> + 	low > high ] whileFalse:
> + 		[ | test |
> + 		test := aBlock value: (self at: index).
> + 		test = 0
> + 			ifTrue:
> + 				[ "In case there are duplicate keys, walk backward to the first."
> + 				[ index > 1 and: [ (aBlock value: (self at: index - 1)) = 0 ] ] whileTrue: [ index := index - 1 ].
> + 				^ actionBlock value: index ]
> + 			ifFalse:
> + 				[ test > 0
> - 	[
> - 		index := high + low // 2.
> - 		low > high ] whileFalse: [
> - 			| test |
> - 			test := aBlock value: (self at: index).
> - 			test = 0
> - 				ifTrue: [ ^actionBlock value: index ]
> - 				ifFalse: [ test > 0
>  					ifTrue: [ low := index + 1 ]
>  					ifFalse: [ high := index - 1 ] ] ].
> + 	^ exceptionBlock
> + 		cull: high
> + 		cull: low!
> - 	^exceptionBlock cull: high cull: low!
>
>
>


More information about the Squeak-dev mailing list