Set::atRandom is extremely nonuniform / suggested fixes
Joshua Scholar
jscholar at access4less.net
Mon Nov 29 16:11:21 UTC 2004
Set::atRandom and therefore everything derived from Set (like Dictionary)
has a terrible atRandom. An entry with 3 nils in front of it is 4 times as
likely to be chosen as one with no nils in front of it.
I recommend changing from
atRandom: aGenerator
"Answer a random element of the receiver. Uses aGenerator which
should be kept by the user in a variable and used every time. Use
this instead of #atRandom for better uniformity of random numbers
because only you use the generator. Causes an error if self has no
elements."
| ind |
self emptyCheck.
ind _ aGenerator nextInt: array size.
ind to: array size do:[:i|
(array at: i) == nil ifFalse:[^array at: i]].
1 to: ind do:[:i|
(array at: i) == nil ifFalse:[^array at: i]].
self errorEmptyCollection.
To
atRandom: aGenerator
"Answer a random element of the receiver. Uses aGenerator which
should be kept by the user in a variable and used every time. Use
this instead of #atRandom for better uniformity of random numbers
because only you use the generator. Causes an error if self has no
elements."
| ind entry |
entry _ nil.
[entry == nil] whileTrue: [
self emptyCheck.
ind _ aGenerator nextInt: array size.
entry _ array at: ind.
].
^ array at: entry.
....
Note I put the emptyCheck in the loop just in case there's a possiblilty of
another thread changing the object while it's being scanned..
A somewhat faster alternative in the bad case where you have a large array
with only a few entries is to do a few checks at random and then to settle
down to scanning the rest with a randomly chosen, relatively-prime span
size. For that you need something like:
atRandom: aGenerator
| ind entry i span foundrelativeprime|
self emptyCheck.
entry _ nil.
i_1;
"try 10 times with random samples"
[ entry == nil and: [i<10] ] whileTrue: [
ind _ aGenerator nextInt: array size.
entry _ array at: ind.
i _ i+1;
].
entry == nil ifTrue: [
"failed 10 random samples - the array must be almost empty"
"pick a random, relatively prime span and scan every entry
foundrelativeprime _ false.
[ foundrelativeprime] whileFalse: [
span _ 1 + (aGenerator nextInt: ((array size)-1)).
foundrelativeprime _ (1 = ((array size) gcd: span)).
].
i _ array size;
[i > 1] whileTrue: [
ind _ (ind+span) \\ ((array size).
entry _ array at: ind.
entry == nil ifFalse:[^entry]].
i _ i-1.
]
self errorEmptyCollection.
]
^entry.
More information about the Squeak-dev
mailing list
|