[squeak-dev] The Inbox: Collections-cmm.576.mcz

Chris Muller asqueaker at gmail.com
Fri Jul 25 02:15:14 UTC 2014


On Thu, Jul 24, 2014 at 4:16 PM, Levente Uzonyi <leves at elte.hu> wrote:

> On Thu, 24 Jul 2014, commits at source.squeak.org wrote:
>
>  Chris Muller uploaded a new version of Collections to project The Inbox:
>> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Collections-cmm.576
>> Author: cmm
>> Time: 24 July 2014, 10:39:06.452 am
>> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
>> Ancestors: Collections-eem.575
>>
>> When a SequenceableCollection contains duplicate keys,
>> #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
>>
>
> There are two problems with this change
> - it breaks this method's runtime guarantee which is O(log(n)) for all
> cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually
> Big Theta instead of Big Omicron[1], but it's not on my keyboard).
> - it introduces unwanted behavior. What if I want the index of the last
> matching element?
>

For me, it introduced the _wanted_ behavior.  :)  But yes, its very easy to
imagine other cases where it wouldn't be needed and so the extra traversal
would make it unwanted.  Since what I need can be done on the outside, we
don't need such unwantedness on the inside.

Incidentally, it seems like the current behavior might not always answer
the last-matching element; like dividing odd sizes, remainders and
one-off's, you might get 2nd from last matching or even one besides that?
 Boy, Franks words are ringing clear about tests for this.


> If you want the index of the first matching element, then you can do two
> things:
> - find the index after the binary search yourself (which is still not a
> good idea because of the possible performance loss)
>

That's what I ended up doing.  I know the composition of the data; very few
duplicate keys.  It's a perfect solution for me.

- change the binary search to behave as you'd like to
>
> | haystack needle |
> haystack := #(1 3 5 7 11 11 15 23).
> needle := 11.
> haystack
>         findBinaryIndex: [ :each |
>                 needle = each
>                         ifTrue: [ -1 " Go to the left in case of a match
> to ensure that upper will be the index of the leftmost match " ]
>                         ifFalse: [ needle - each ] ]
>         do: nil "We'll never get here, so it can be anything."
>         ifNone: [ :lower :upper |
>                 " upper is the index of the leftmost match if there's a
> match, and its value is at least 1 "
>                 (upper <= haystack size and: [ (haystack at: upper) =
> needle ])
>                         ifTrue: [ upper ]
>                         ifFalse: [ 0 ] ]
>

Ah, interesting..


>
> With a slight modification you can find the index of the last match too.
>
>
> Levente
>
> [1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_
> Bachmann.E2.80.93Landau_notations
> [2] Here's a small benchmark demonstrating the edge case with your change:
>
> | haystack needle |
> Smalltalk garbageCollect.
> haystack := Array new: 100000 withAll: 11.
> needle := 11.
> { [ haystack
>         findBinaryIndex: [ :each |
>                 needle = each
>                         ifTrue: [ -1 ]
>                         ifFalse: [ needle - each ] ]
>         do: nil
>         ifNone: [ :lower :upper |
>                 (upper <= haystack size and: [ (haystack at: upper) =
> needle ])
>                         ifTrue: [ upper ]
>                         ifFalse: [ 0 ] ] ] bench.
> [ haystack
>         findBinaryIndex: [ :each | needle - each ]
>         do: [ :each | each ]
>         ifNone: [ 0 ] ] bench }
> #('1,150,000 per second.' '741 per second.')



LOL!!!!!  Good one!  Oh, okay, yes, you did call that an "edge" case.  More
like a "nil" case, but it sure made for a dramatic comparison!   :-D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20140724/44e67150/attachment.htm


More information about the Squeak-dev mailing list