<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jul 24, 2014 at 4:16 PM, Levente Uzonyi <span dir="ltr"><<a href="mailto:leves@elte.hu" target="_blank">leves@elte.hu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class="">On Thu, 24 Jul 2014, <a href="mailto:commits@source.squeak.org" target="_blank">commits@source.squeak.org</a> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Chris Muller uploaded a new version of Collections to project The Inbox:<br>
<a href="http://source.squeak.org/inbox/Collections-cmm.576.mcz" target="_blank">http://source.squeak.org/<u></u>inbox/Collections-cmm.576.mcz</a><br>
<br>
==================== Summary ====================<br>
<br>
Name: Collections-cmm.576<br>
Author: cmm<br>
Time: 24 July 2014, 10:39:06.452 am<br>
UUID: 17b60ac6-55b3-4706-b1c8-<u></u>d826e6ad5657<br>
Ancestors: Collections-eem.575<br>
<br>
When a SequenceableCollection contains duplicate keys, #findBinaryIndex:do:ifNone: should answer the index of the _earliest_.<br>
</blockquote>
<br></div>
There are two problems with this change<br>
- it breaks this method's runtime guarantee which is O(log(n)) for all cases. Worst case runtime becomes O(n) [2], instead of O(log(n)) (actually Big Theta instead of Big Omicron[1], but it's not on my keyboard).<br>
- it introduces unwanted behavior. What if I want the index of the last matching element?<br></blockquote><div><br></div><div>For me, it introduced the _wanted_ behavior. :) But yes, its very easy to imagine other cases where it wouldn't be needed and so the extra traversal would make it unwanted. Since what I need can be done on the outside, we don't need such unwantedness on the inside.</div>
<div><br></div><div>Incidentally, it seems like the current behavior might not always answer the last-matching element; like dividing odd sizes, remainders and one-off's, you might get 2nd from last matching or even one besides that? Boy, Franks words are ringing clear about tests for this.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
If you want the index of the first matching element, then you can do two things:<br>
- find the index after the binary search yourself (which is still not a good idea because of the possible performance loss)<br></blockquote><div><br></div><div>That's what I ended up doing. I know the composition of the data; very few duplicate keys. It's a perfect solution for me.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
- change the binary search to behave as you'd like to<br>
<br>
| haystack needle |<br>
haystack := #(1 3 5 7 11 11 15 23).<br>
needle := 11.<br>
haystack<br>
findBinaryIndex: [ :each |<br>
needle = each<br>
ifTrue: [ -1 " Go to the left in case of a match to ensure that upper will be the index of the leftmost match " ]<br>
ifFalse: [ needle - each ] ]<br>
do: nil "We'll never get here, so it can be anything."<br>
ifNone: [ :lower :upper |<br>
" upper is the index of the leftmost match if there's a match, and its value is at least 1 "<br>
(upper <= haystack size and: [ (haystack at: upper) = needle ])<br>
ifTrue: [ upper ]<br>
ifFalse: [ 0 ] ]<br></blockquote><div><br></div><div>Ah, interesting..</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
With a slight modification you can find the index of the last match too.<br>
<br>
<br>
Levente<br>
<br>
[1] <a href="https://en.wikipedia.org/wiki/Big_O_notation#Family_of_Bachmann.E2.80.93Landau_notations" target="_blank">https://en.wikipedia.org/wiki/<u></u>Big_O_notation#Family_of_<u></u>Bachmann.E2.80.93Landau_<u></u>notations</a><br>
[2] Here's a small benchmark demonstrating the edge case with your change:<br>
<br>
| haystack needle |<br>
Smalltalk garbageCollect.<br>
haystack := Array new: 100000 withAll: 11.<br>
needle := 11.<br>
{ [ haystack<br>
findBinaryIndex: [ :each |<br>
needle = each<br>
ifTrue: [ -1 ]<br>
ifFalse: [ needle - each ] ]<br>
do: nil<br>
ifNone: [ :lower :upper |<br>
(upper <= haystack size and: [ (haystack at: upper) = needle ])<br>
ifTrue: [ upper ]<br>
ifFalse: [ 0 ] ] ] bench.<br>
[ haystack<br>
findBinaryIndex: [ :each | needle - each ]<br>
do: [ :each | each ]<br>
ifNone: [ 0 ] ] bench }<br>
#('1,150,000 per second.' '741 per second.')</blockquote><div><br></div><div><br></div><div>LOL!!!!! Good one! Oh, okay, yes, you did call that an "edge" case. More like a "nil" case, but it sure made for a dramatic comparison! :-D</div>
<div><br></div></div></div></div>