[squeak-dev] The Inbox: Collections-cmm.576.mcz
Chris Muller
asqueaker at gmail.com
Fri Jul 25 02:15:14 UTC 2014
On Thu, Jul 24, 2014 at 4:16 PM, Levente Uzonyi <leves at elte.hu> wrote:
> On Thu, 24 Jul 2014, commits at source.squeak.org wrote:
>
> Chris Muller uploaded a new version of Collections to project The Inbox:
>> http://source.squeak.org/inbox/Collections-cmm.576.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Collections-cmm.576
>> Author: cmm
>> Time: 24 July 2014, 10:39:06.452 am
>> UUID: 17b60ac6-55b3-4706-b1c8-d826e6ad5657
>> Ancestors: Collections-eem.575
>>
>> When a SequenceableCollection contains duplicate keys,
>> #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.
>>
>
> There are two problems with this change
> - it breaks this method's runtime guarantee which is O(log(n)) for all
> cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually
> Big Theta instead of Big Omicron[1], but it's not on my keyboard).
> - it introduces unwanted behavior. What if I want the index of the last
> matching element?
>
For me, it introduced the _wanted_ behavior. :) But yes, its very easy to
imagine other cases where it wouldn't be needed and so the extra traversal
would make it unwanted. Since what I need can be done on the outside, we
don't need such unwantedness on the inside.
Incidentally, it seems like the current behavior might not always answer
the last-matching element; like dividing odd sizes, remainders and
one-off's, you might get 2nd from last matching or even one besides that?
Boy, Franks words are ringing clear about tests for this.
> If you want the index of the first matching element, then you can do two
> things:
> - find the index after the binary search yourself (which is still not a
> good idea because of the possible performance loss)
>
That's what I ended up doing. I know the composition of the data; very few
duplicate keys. It's a perfect solution for me.
- change the binary search to behave as you'd like to
>
> | haystack needle |
> haystack := #(1 3 5 7 11 11 15 23).
> needle := 11.
> haystack
> findBinaryIndex: [ :each |
> needle = each
> ifTrue: [ -1 " Go to the left in case of a match
> to ensure that upper will be the index of the leftmost match " ]
> ifFalse: [ needle - each ] ]
> do: nil "We'll never get here, so it can be anything."
> ifNone: [ :lower :upper |
> " upper is the index of the leftmost match if there's a
> match, and its value is at least 1 "
> (upper <= haystack size and: [ (haystack at: upper) =
> needle ])
> ifTrue: [ upper ]
> ifFalse: [ 0 ] ]
>
Ah, interesting..
>
> With a slight modification you can find the index of the last match too.
>
>
> Levente
>
> [1] https://en.wikipedia.org/wiki/Big_O_notation#Family_of_
> Bachmann.E2.80.93Landau_notations
> [2] Here's a small benchmark demonstrating the edge case with your change:
>
> | haystack needle |
> Smalltalk garbageCollect.
> haystack := Array new: 100000 withAll: 11.
> needle := 11.
> { [ haystack
> findBinaryIndex: [ :each |
> needle = each
> ifTrue: [ -1 ]
> ifFalse: [ needle - each ] ]
> do: nil
> ifNone: [ :lower :upper |
> (upper <= haystack size and: [ (haystack at: upper) =
> needle ])
> ifTrue: [ upper ]
> ifFalse: [ 0 ] ] ] bench.
> [ haystack
> findBinaryIndex: [ :each | needle - each ]
> do: [ :each | each ]
> ifNone: [ 0 ] ] bench }
> #('1,150,000 per second.' '741 per second.')
LOL!!!!! Good one! Oh, okay, yes, you did call that an "edge" case. More
like a "nil" case, but it sure made for a dramatic comparison! :-D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20140724/44e67150/attachment.htm
More information about the Squeak-dev
mailing list
|