[FIX][BUG] Set>>atRandom: is not random ( another - better? - solution)

Lothar Schenk lothar.schenk at gmx.de
Fri Feb 27 17:42:14 UTC 2004


I had tried to treat this bug, because Samir's original post was listed as 
'Unreviewed' in BFAV. Now, I see that there has already been a discussion 
about possible solutions by Torge Husfeldt and Daniel Vainsencher. Torge 
Husfeldt has already presented 2 possible solutions, both of which were not 
quite satisfactory due to different drawbacks. Torge's second solution is 
essentially the same as mine (though more succinctly implemented). I agree 
that it has the drawback of being time-consuming in that it has to iterate 
(on the average) over half the Set, which may be significant for large Sets.

I think, I have now a better solution. It takes advantage of the fact, that 
Sets are not guaranteed to be in a particular order, i.e. it should be 
possible to move the elements around within the array used for implementation 
without affecting the outside behavior. (SOMEBODY PLEASE CONFIRM THIS).

My new solution is essentially the same as the original one, except that 
afterwards (when a scan from a nil element was necessary) it exchanges the 
place of the element that is found with the one which was originally randomly 
chosen. This has the effect that subsequent hits behind the first slot will 
no longer lead to the same element.

As an example, if the Set is implemented by array
( nil 'a' 'b' nil nil nil)
and our random sequence begins with the elements #5 #1 ...)
then the original implementation hits nil at spot #5 and scans onward until it 
finds 'a' at spot #2, then it hits nil at spot #1 and scans onward, again 
finding 'a' at spot #2.
In my implementation, after the first scan, the 'a' gets moved to spot #5, so 
the array then is
(nil nil 'b' nil 'a' nil)
and the subsequent scan after the hit at #1 now produces 'b' as result.

I have tested this with the code given by Samir and found that this trick does 
indeed produce a more ideally uniform distribution, while not being as costly 
as the first solution.

Regards, Lothar 

===============================================
Subject: [FIX][BUG] Set>>atRandom: is not random
Date: Thu, 26 Feb 2004 21:22:49 +0100
From: Lothar Schenk <lothar.schenk at gmx.de>
To: The general-purpose Squeak developers list 
<squeak-dev at lists.squeakfoundation.org>

The problem still persists in a 3.7a image. Samir's explanation is correct.
The current implementation picks a random element in the array internal to
Set, and if this element is nil it starts to scan for a non-nil element from
there. This leads to a non-uniform distribution of the sampled elements when
nil-elements are present.

I have attached a fix.

Regards, Lothar

==============================================
Subject:		[BUG] Set>>atRandom: is not random
Author:		Samir Saidani
Date Posted: 26 June 2003
Archive ID: 10798

Comments:	Here is a little piece of code to reproduce the bug:
|s a|

s_ Set new add: 'a'; add: 'b'; yourself.
a_Random new.
i_0.
500 timesRepeat: [ ((s atRandom: a)='a') ifTrue:[ i_i+1 ]. ].
Transcript show: i; cr.

Results (~100) show that you have 1/5 to have the first element.

A workaround is to change a set as an array (probability is 1/2).

|s a|

s_ Set new add: 'a'; add: 'b'; yourself.
a_Random new.
i_0.
500 timesRepeat: [ ((s asArray atRandom: a)='a') ifTrue:[ i_i+1 ]. ].
Transcript show: i; cr.

Don't have time for the moment to fix it, but certainly due to the
"array" representation of a set (Set(1,2) appears as an array of size
5 #(nil 1 2 nil nil nil).

--
Samir SAIDANI

-------------------------------------------------------

-------------- next part --------------
'From Squeak3.7alpha of 11 September 2003 [latest update: #5707] on 27 February 2004 at 5:54:21 pm'!

!Set methodsFor: 'accessing' stamp: 'los 2/27/2004 17:54'!
atRandom: aGenerator

	"Answer a random element of the receiver.  Uses aGenerator which
	should be kept by the user in a variable and used every time. Use
	this instead of #atRandom for better uniformity of random numbers 
	because only you use the generator.  Causes an error if self has no 
	elements."
	
	| ind ind2 |
	
	self emptyCheck. "There must be at least one regular (not-nil) element in array"
	
	ind := aGenerator nextInt: array size.
	(array at: ind) ifNotNil: [ ^ array at: ind ].
	
	"We hit a nil element (empty slot) -
	 scan for the next regular (non-nil)  element"
	
	ind2 := ind.
	[ (array at: ind2) isNil ] whileTrue:
		[ ind2 := ind2 + 1.
		  (ind2 > array size) ifTrue: [ ind2 := 1 ] "wrap if beyond end of array" ].
	
	"Move the element we found to the original spot that was first selected
	 to counteract the effect that nil elements may have on the uniformity
	of the distribution of selected elements"
	
	array at: ind put: (array at: ind2).
	array at: ind2 put: nil.
	
	^ array at: ind! !



More information about the Squeak-dev mailing list