Random Numbers

Joachim Durchholz joachim.durchholz at munich.netsurf.de
Wed May 19 18:54:42 UTC 1999


Steve Wart wrote:
> 
> > * Why is the value, supposedly already truly random, run through the
> > Park-Miller generator once? Are you sure that this doesn't have some
> > damaging effect on the randomness of the result? Since the P-M
> > generator produces 32-bit values, it would at least seem to waste
> > half of the data you fetch.
> >
> I initially avoided the extra generator step, but I had problems
> scaling the number to fall between 0.0 and 1.0. The LavaRandom class
> is not using the standard generator, but one that is simplified to
> just scale the numbers. I agree that 64 bit numbers is probably
> overkill. I originally tried to interpret them as IEEE values from the
> random stream, but this approach was too naive.

On scaling:
A single byte can be interpreted as a number between 0 and 255 (then
it's called "unsigned") or between -128 and 127 (then it's called
"signed").
To scale such a byte to a number between 0.0 and 1.0, just divide the
byte value by the range of numbers expressible with it (which is 256 in
both cases: you have 256 values to select from). If your numbers are
signed, add 128 before dividing (or add 0.5 after dividing - this is
largely a matter of personal preference; for example, I prefer to stay
with integer arithmetic as long as possible because round-off errors are
much easier to control there).
In Smalltalk, you can live even easier: Just use the byte as the
numerator in a fraction with a denominator of 256. Any scaling will be
done by Squeak for you.

If you want more significant bits in your random numbers, just use more
bytes:

 Bytes  Unsigned Range    Signed Range
   2    0 ...      65536       -32768 ...      32767
   3    0 ...   16777216    -16777216 ...   16777215
   4    0 ... 4294967296  -2147483648 ... 2147483647

In hexadecimal notation, this is

 Bytes  Unsigned Range    Signed Range
   2    0 ...      10000  ffFFFF8000 ...       7FFF
   3    0 ...    1000000  ffFF800000 ...     7FFFFF
   4    0 ...  100000000  ff80000000 ...   7FFFFFFF
(Those two small f's under "Signed Range" indicate an infinity of F's
preceding the written number. In practice, processors store only a
finite number of F's and don't care what happens with that infinity of
digits, and everything happens to work out OK as long as there is no
overflow.)

The hexadecimal notation reveals the laws that govern these "magic
numbers", and they can indeed be computed as

 Bytes  Unsigned Range    Signed Range
   N    0 ...        2^N    -2^(N-1)      2^(N-1)-1

> > * Have you tested this generator using the techniques in Knuth
> > Volume 2 (3rd ed)? If not, than how do you know what you have?

I'm not sure whether your edition of Knuth is the same as the one I read
years ago. However, I found the tests rather incomprehensible.

I'd do something else: I'd use that network source of random numbers to
seed my RNG, and generate my own random numbers by myself. That way,
even if the network randoms somehow get out of control (be it by
accident, fault, or design), you still have a good random sequence.
Plus, generating a random number (even with a complicated algorithm) is
likely to be *much* faster than getting the numbers from the Net.

> My biggest concern is that the "scaling" I did damaged the randomness.

Multiplying random numbers with a fixed value doesn't alter their
randomness: if it was bad before it will be exactly as bad afterwards.
Round-off errors may damage the last significant bit of your random
number (and maybe, if you have a really bad math lib, the last two
bits).

> The thing I really like about the source data is that they are
> continually being tested for randomness and there are a number of
> nifty animated graphs showing time sequences of the random data.
> Other sites such as www.random.org do not post tests of their data.

Well, they're telling you lots of data about their numbers, but you
cannot be sure whether their data is correct unless you generate it for
yourself. So if you're concerned about the quality of the randomness,
you should do your own random generator anyway.

> > Note that the last digits are different when converted back to
> > integer. In hex, you can see that the value has been rounded during
> > conversion to float:
> >
> > 9223372036854775807 hex
> > '16r7FFFFFFFFFFFFFFF'
> >
> > 9223372036854779904 hex
> > '16r8000000000001000'
> >
> I guess this is one way that the scaling method could damage the
> randomness of the data. I suppose we could toss out the extra 12 bits
> of information if it does have any effect. Or maybe just go to 32 bit
> integers (which would double the useful life of the stream).

You don't have to explicitly toss out the extra bits. The float
multiplication will do that automatically for you.

> > >3. It could be used for doing cryptography in Squeak, not just
> > >   games

I wouldn't use a source of random numbers that's observable by anybody
else on the Internet. I might take these numbers as a seed for my random
number generator, but even that isn't very convincing: you have to worry
that nobody can guess the random seeds that your RNG was initially fed
with, so you'd have to store some randoms and wait some indefinite (but
long) time to make sure that nobody can guess it.
Even then you'd not be secure: The random number source has a rather low
bandwidth. So an attacker who knows that you're using this source just
records all the random numbers that the source ever emitted. This way,
he'll just have to check a few million keys (instead of 10^100 or a
similar astronomically high number of keys if he doesn't know where you
got your random seed from).
PGP (Windows version) lets you move the mouse around for a while. I
think it just intercepts mouse messages, taking the lowest bit from each
and appending it to the RNG key. Try to guess *that* number :)
Other sources include: The timer at program start-up, the number of free
bytes on the hard disk, a checksum of the BIOS area (if you can get at
it), a checksum of the Squeak changes file (or, maybe, just the last few
kilobytes of it - the first few K won't give you much randomness), the
total length of all file names on the disk (well, that might take too
long for impatient users with large disks), a passphrase that the user
enters (don't expect more than one or two bits of randomness per
character typed), etc. etc. Then, take all that data and run a
cryptographically secure checksum algorithm over it. This will spreak
all randomness across all the bits of the checksum. Take the checksum as
your random number seed.

Once you have a good seed, go get Bruce Schneier's Applied Cryptography.
His book contains a wealth of knowledge on cryptografically secure RNGs
and checksums, complete with algorithms and sources. Bruce's RNGs will
beat any WWW-based source of randomness hands down!

Regards,
Joachim
-- 
Please don't send unsolicited ads.





More information about the Squeak-dev mailing list