Set::atRandom is extremely nonuniform / suggested fixes

Joshua Scholar jscholar at access4less.net
Mon Nov 29 16:11:21 UTC 2004


Set::atRandom and therefore everything derived from Set (like Dictionary)
has a terrible atRandom.  An entry with 3 nils in front of it is 4 times as
likely to be chosen as one with no nils in front of it.

I recommend changing from

atRandom: aGenerator
 "Answer a random element of the receiver.  Uses aGenerator which
 should be kept by the user in a variable and used every time. Use
 this instead of #atRandom for better uniformity of random numbers
 because only you use the generator.  Causes an error if self has no
 elements."
 | ind |
 self emptyCheck.
 ind _ aGenerator nextInt: array size.
 ind to: array size do:[:i|
     (array at: i) == nil ifFalse:[^array at: i]].
 1 to: ind do:[:i|
     (array at: i) == nil ifFalse:[^array at: i]].
 self errorEmptyCollection.

To

atRandom: aGenerator
 "Answer a random element of the receiver.  Uses aGenerator which
 should be kept by the user in a variable and used every time. Use
 this instead of #atRandom for better uniformity of random numbers
 because only you use the generator.  Causes an error if self has no
 elements."
  | ind entry |
 entry _ nil.
 [entry == nil] whileTrue: [
      self emptyCheck.
      ind _ aGenerator nextInt: array size.
      entry _ array at: ind.
 ].
 ^ array at: entry.

....
Note I put the emptyCheck in the loop just in case there's a possiblilty of
another thread changing the object while it's being scanned..

A somewhat faster alternative in the bad case where you have a large array
with only a few entries is to do a few checks at random and then to settle
down to scanning the rest with a randomly chosen, relatively-prime span
size. For that you need something like:

atRandom: aGenerator
  | ind entry i span foundrelativeprime|
    self emptyCheck.
    entry _ nil.
    i_1;
    "try 10 times with random samples"
    [ entry == nil and: [i<10] ] whileTrue: [
          ind _ aGenerator nextInt: array size.
          entry _ array at: ind.
           i _ i+1;
    ].
    entry == nil  ifTrue: [
        "failed 10 random samples - the array must be almost empty"
        "pick a random, relatively prime span and scan every entry
        foundrelativeprime _ false.
        [ foundrelativeprime] whileFalse: [
            span _ 1 + (aGenerator nextInt: ((array size)-1)).
            foundrelativeprime _ (1 = ((array size) gcd: span)).
        ].
        i _ array size;
        [i > 1] whileTrue: [
            ind _ (ind+span) \\ ((array size).
            entry _ array at: ind.
            entry == nil ifFalse:[^entry]].
            i _ i-1.
        ]
        self errorEmptyCollection.
    ]
 ^entry.




More information about the Squeak-dev mailing list