[squeak-dev] Changeset: Interpolation search

Thiede, Christoph Christoph.Thiede at student.hpi.uni-potsdam.de
Tue Oct 6 12:20:54 UTC 2020


Hi all,


please find the attached changeset in which I'm proposing a set of new methods on SequenceableCollection that implement interpolation search in Squeak. Interpolation search works similar for linear search, but it is supplied a number indicating distance rather than a binary/ternary information like binary search. Interpolation search is a bit slower for small collections because handling the distance value introduces some overhead, but it outperforms the other in the case of a) very large collections and b) expensive block operations. In addition, the changeset contains tests for both all #findBinary: and all #findDelta: overloads.


The following example demonstrates usage + benchmarks of the new methods:


data := (1 to: 10000000) select: [:i | i isPrime].
probe := (1 to: 10000) collect: [:i | data atRandom].

[probe collect: [:p |
data findBinaryIndex: [:d | p <=> d]]] benchFor: 10 seconds. "--> '107 per second. 9.37 milliseconds per run. 5.21896 % GC time.'"
[probe collect: [:p |
data findDeltaIndex: [:d | p - d]]] benchFor: 10 seconds. "--> '196 per second. 5.1 milliseconds per run. 9.89901 % GC time.'"

(9.37 milliSeconds / 5.1 milliSeconds) asScaledDecimal reciprocal "--> 0.54429028s8"


Looking forward to your review! :-)

In particular, I already wondered whether #findDelta: is the best name for this feature. Does anyone know a better one?

We could also add a fourth parameter for specifying a custom interpolation function. However, I could not find any real-world example for this, so this would probably be speculative generality.


Best,

Christoph

________________________________


'From Squeak6.0alpha of 1 October 2020 [latest update: #19893] on 6 October 2020 at 1:53:27 pm'!

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:46'!
findBinary: aBlock do: actionBlock ifNone: exceptionBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If found, evaluate actionBlock with the found element as argument. If no matching element is found, evaluate exceptionBlock, with the 'bounding' elements (or nil) as optional arguments.
If aBlock performs expensive operations, also consider using #findDelta:do:ifNone: instead.
Examples:
#(1 3 5 7 11 15 23)
findBinary: [ :arg | 11 <=> arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findBinary: [ :arg | 12 <=> arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findBinary: [ :arg | 0.5 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findBinary: [ :arg | 25 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ',{a. b} printString) ]
"
^self
findBinaryIndex: aBlock
do: [ :foundIndex | actionBlock value: (self at: foundIndex) ]
ifNone: [ :prevIndex :nextIndex |
exceptionBlock
cull: (prevIndex > 0 ifTrue: [ self at: prevIndex ])
cull: (nextIndex <= self size ifTrue: [ self at: nextIndex ]) ]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:46'!
findBinaryIndex: aBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If no matching element is found, raise an error.
If aBlock performs expensive operations, also consider using #findDeltaIndex: instead.
Example:
#(1 3 5 7 11 15 23) findBinaryIndex: [ :arg | 11 <=> arg ]
"
^self findBinaryIndex: aBlock do: [ :found | found ] ifNone: [ self errorNotFound: aBlock]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:47'!
findBinaryIndex: aBlock do: actionBlock ifNone: exceptionBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If found, evaluate actionBlock with the index as argument. If no matching element is found, evaluate exceptionBlock, with the indexes of the 'bounding' elements as optional arguments.
Warning: Might give invalid indexes, see examples below.
If aBlock performs expensive operations, also consider using #findDeltaIndex:do:ifNone: instead.
Examples:
#(1 3 5 7 11 15 23)
findBinaryIndex: [ :arg | 11 <=> arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString)]
#(1 3 5 7 11 15 23)
findBinaryIndex: [ :arg | 1 <=> arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23) d
findBinaryIndex: [ :arg | 0.5 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findBinaryIndex: [ :arg | 25 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ',{a. b} printString) ]
"
| index low high test |
low := 1.
high := self size.
[ high < low ] whileFalse: [
index := high + low // 2.
(test := aBlock value: (self at: index)) < 0
ifTrue: [ high := index - 1 ]
ifFalse: [
0 < test
ifTrue: [ low := index + 1 ]
ifFalse: [ "test = 0"
^actionBlock value: index ] ] ].
^exceptionBlock cull: high cull: low! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:47'!
findBinaryIndex: aBlock ifNone: exceptionBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If no matching element is found, evaluate exceptionBlock, with the indexes of the 'bounding' elements as optional arguments.
Warning: Might give invalid indexes.
If aBlock performs expensive operations, also consider using #findDeltaIndex:ifNone: instead."

^self findBinaryIndex: aBlock do: [ :found | found ] ifNone: exceptionBlock! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:47'!
findBinary: aBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If no matching element is found, raise an error.
If aBlock performs expensive operations, also consider using #findDelta: instead.
Example:
#(1 3 5 7 11 15 23) findBinary: [ :arg | 11 <=> arg ]
"
^self findBinary: aBlock do: [ :found | found ] ifNone: [ self errorNotFound: aBlock ]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:47'!
findBinary: aBlock ifNone: exceptionBlock
"Search for an element in the receiver using binary search.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - if the search should continue in the first half;
>0 - if the search should continue in the second half.
If no matching element is found, evaluate exceptionBlock, with the 'bounding' elements (or nil) as optional arguments.
If aBlock performs expensive operations, also consider using #findDelta:ifNone: instead."

^self findBinary: aBlock do: [ :found | found ] ifNone: exceptionBlock! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:45'!
findDelta: aBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from).
Example:
#(1 3 5 7 11 15 23) findDelta: [ :arg | 11 - arg ]
"
^ self findDelta: aBlock do: [ :found | found ] ifNone: [ self errorNotFound: aBlock ]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:45'!
findDelta: aBlock do: actionBlock ifNone: exceptionBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
If found, evaluate actionBlock with the found element as argument. If no matching element is found, evaluate exceptionBlock, with the 'bounding' elements (or nil) as optional arguments.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from).
Examples:
#(1 3 5 7 11 15 23)
findDelta: [ :arg | 11 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findDelta: [ :arg | 12 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findDelta: [ :arg | 0.5 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findDelta: [ :arg | 25 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ',{a. b} printString) ]
"
^ self
findDeltaIndex: aBlock
do: [ :foundIndex | actionBlock value: (self at: foundIndex) ]
ifNone: [ :prevIndex :nextIndex |
exceptionBlock
cull: (prevIndex > 0 ifTrue: [ self at: prevIndex ])
cull: (nextIndex <= self size ifTrue: [ self at: nextIndex ]) ]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:45'!
findDelta: aBlock ifNone: exceptionBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
If no matching element is found, evaluate exceptionBlock, with the 'bounding' elements (or nil) as optional arguments.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from)."
^ self findDelta: aBlock do: [ :found | found ] ifNone: exceptionBlock! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:45'!
findDeltaIndex: aBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
If no matching element is found, raise an error.
Warning: Might give invalid indexes.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from).
Example:
#(1 3 5 7 11 15 23) findDeltaIndex: [ :arg | 11 - arg ]
"
^ self
findDeltaIndex: aBlock
do: [ :found | found ]
ifNone: [ self errorNotFound: aBlock ]! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/6/2020 13:52'!
findDeltaIndex: aBlock do: actionBlock ifNone: exceptionBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
If found, evaluate actionBlock with the index as argument. If no matching element is found, evaluate exceptionBlock, with the indexes of the 'bounding' elements as optional arguments.
Warning: Might give invalid indexes, see examples below.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from).
Examples:
#(1 3 5 7 11 15 23)
findDeltaIndex: [ :arg | 11 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString)]
#(1 3 5 7 11 15 23)
findDeltaIndex: [ :arg | 12 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23) d
findDeltaIndex: [ :arg | 0.5 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ', {a. b} printString) ]
#(1 3 5 7 11 15 23)
findDeltaIndex: [ :arg | 25 - arg ]
do: [ :found | found ]
ifNone: [ :a :b | ('between: ',{a. b} printString) ]
"
| index low high delta |
low := 1.
high := self size.
index := low + high // 2.
[ index between: low and: high ] whileTrue: [ | range |
index := ((delta := aBlock value: (self at: index)) < 0
ifTrue: [
high := index - 1.
(range := (self at: index) - (self at: low)) = 0
ifTrue: [ high ]
ifFalse: [ high + (high - low * delta // range) ] ]
ifFalse: [ 0 < delta ifTrue: [
low := index + 1.
(range := (self at: high) - (self at: index)) = 0
ifTrue: [ low ]
ifFalse: [ low + (high - low * delta // range) ] ]
ifFalse: [ "delta = 0"
^ actionBlock value: index ] ]) min: high max: low.
].
^ exceptionBlock cull: high cull: low! !

!SequenceableCollection methodsFor: 'enumerating' stamp: 'ct 10/4/2020 15:45'!
findDeltaIndex: aBlock ifNone: exceptionBlock
"Search for an element in the receiver using interpolation search with a linear interpolation function.
The argument aBlock is a one-element block returning
0 - if the element is the one searched for;
<0 - the distance to the searched value, if the search should continue in the first half;
>0 - the distance to the searched, if the search should continue in the second half.
If no matching element is found, evaluate exceptionBlock, with the indexes of the 'bounding' elements as optional arguments.
Warning: Might give invalid indexes.
Practical performance of interpolation search depends on whether the reduced number of probes is outweighed by the more complicated calculations needed for each probe. For more information, read the Wikipedia article (which the previous sentence was cited from)."
^ self findDeltaIndex: aBlock do: [ :found | found ] ifNone: exceptionBlock! !


!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:47'!
testFindBinary

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every equals: (
values findBinary: [:each | every <=> each])]
ifFalse: [
self
should: [values findBinary: [:each | every <=> each]]
raise: NotFound]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:48'!
testFindBinaryDoIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every negated equals: (values
findBinary: [:each | every <=> each]
do: [:each | each negated]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findBinary: [:each | every <=> each]
do: [self fail]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:49'!
testFindBinaryIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every equals: (values
findBinary: [:each | every <=> each]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findBinary: [:each | every <=> each]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:51'!
testFindBinaryIndex

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) equals: (
values findBinaryIndex: [:each | every <=> each])]
ifFalse: [
self
should: [values findBinaryIndex: [:each | every <=> each]]
raise: NotFound]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:52'!
testFindBinaryIndexDoIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) negated equals: ((values
findBinaryIndex: [:each | every <=> each]
do: [:result | result negated]
ifNone: [self fail]))]
ifFalse: [
self assert: self equals: (values
findBinaryIndex: [:each | every <=> each]
do: [:result | self fail]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 22:52'!
testFindBinaryIndexIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) equals: (values
findBinaryIndex: [:each | every <=> each]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findBinaryIndex: [:each | every <=> each]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/2/2020 23:03'!
testFindDelta

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every equals: (
values findDelta: [:each | every - each])]
ifFalse: [
self
should: [values findDelta: [:each | every - each]]
raise: NotFound]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/3/2020 00:12'!
testFindDeltaDoIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every negated equals: (values
findDelta: [:each | every - each]
do: [:each | each negated]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findDelta: [:each | every - each]
do: [self fail]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/3/2020 00:13'!
testFindDeltaIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: every equals: (values
findDelta: [:each | every - each]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findDelta: [:each | every - each]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/3/2020 00:13'!
testFindDeltaIndex

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) equals: (
values findDeltaIndex: [:each | every - each])]
ifFalse: [
self
should: [values findDeltaIndex: [:each | every - each]]
raise: NotFound]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/3/2020 00:13'!
testFindDeltaIndexDoIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) negated equals: ((values
findDeltaIndex: [:each | every - each]
do: [:result | result negated]
ifNone: [self fail]))]
ifFalse: [
self assert: self equals: (values
findDeltaIndex: [:each | every - each]
do: [:result | self fail]
ifNone: [self])]].! !

!SequenceableCollectionTest methodsFor: 'tests - enumerating' stamp: 'ct 10/3/2020 00:14'!
testFindDeltaIndexIfNone

| values |
values := (1 to: 100) select: #isPrime.
1 to: 100 do: [:every |
every isPrime
ifTrue: [
self assert: (values indexOf: every) equals: (values
findDeltaIndex: [:each | every - each]
ifNone: [self fail])]
ifFalse: [
self assert: self equals: (values
findDeltaIndex: [:each | every - each]
ifNone: [self])]].! !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20201006/521fc02a/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: interpolation search.2.cs
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20201006/521fc02a/attachment-0001.ksh>


More information about the Squeak-dev mailing list