HA

HA



Home
Information
Products
Support
Store
How To Order

www.HEPArts.com


Up to random_bytes Page

Up to PKI Utilities

Random Number Generators

Overview

Random Number Generators are programs which produce a stream of numbers which appear to come in random order. On computers, these streams are reproducible, so they are called pseudorandom generators.

In Math, a recursion equation is a formula wherein the next number in the sequence is generated by operating on the previous number in the sequence. These all have the form:

       X(n+1) = f( X(n) )
Here, X indicates the value at the index n, and f() is the function applied to it. All computer generators are based on some functional recursion equation.

Only certain functions modify the input to yield "unpredictable" looking output. Even ones that look random to the eye can have subtle patterns in long sequences.

All generators on computers produce a "machine word" of random data at a time. As this is a finite size, eventually the function will see the same input it saw before, and it will repeat itself. This is known as the period of the function. For a good random number generator, this must be very long.

A generator with a short period is a bad generator. Also, a function could have localized loops into which it could fall, yielding a small number of outputs it produces over and over again. This also would be a bad generator.

John von Neumann suggested, in 1946, using the Squaring of a number and taking the middle half of the digits as the next number. Such a sequence looks pretty random empirically. However, it turned out to have several "ruts" into which it fell quite often. Many functions have been investigated, but only a few have proved to be satisfactory for use in computer generators.

Linear-Congruential

One very common and well studied type of generator is a Linear-Congruential Random Number Generator (LCRNG), which is defined by the following recursion equation:
	X(n+1) = (A * X(n) + C) mod M
The parameters are the the multiplier, A, the additive constant, C, and and Modulus, M. Each successive number, X(n+1), is derived from the previous number, X(n).

Lagged-Fibonacci

Another very common type of generator is a Lagged Fibonacci Random Number Generator (LFRNG), which is defined by the following recursion equation:
	X(n+1) = (X(n-j) + X(n-k)) mod M
The two indices, j and k, are the lags of the series. They indicate how far back in the series to reach for a previous value. In practice, these need to be large, on the order of a hundred, and they must be separated by the same order of distance. These generators have been used successfully since 1958. It was discovered in the 1994 that they can display a slight preference (1%) for 1-bits over 0-bits in the least-significant bit of the generated numbers in certain sequences.
[Vattulainen, et al, Physical Review Letters 73, (1994) 2513-2516.]
[or See Knuth, Vol-2, page-79.]

Using Luescher's discarding technique has proved to be an effective remediation of this discrepancy.
[Luescher, Martin, Computer Physics Communications 79, (1994) 100-110.]
Essentially, one generates a batch of numbers and uses only the first 10% and, then, goes on to generate the next batch.

Combining Components

Shuffling

One of the techniques for improving the confidence in random number generation is to combine the results of two independent sources. If each component is good, then the merged result should be better. This is a fairly intuitive argument, but there are mathematical proofs of some advantages.

As the component generators produce the next number by operating on the previous number, one might worry about correlations in a sequence of random numbers. One of the techniques for reducing this concern is to shuffle the newly generated numbers into a "deck" and to give out a number from this "deck" at each request. Which number taken from the "deck" is determined by another random number generator. With this arrangement, one will rarely get two numbers in sequence which were actually generated in sequence.

Modular Subtraction

Another method of combining two component generators is to use Modular Subtraction:
	Z(n) = ( X(n) - Y(n) ) mod M
where X and Y are values from the two component generators. This technique has the advantage that if the periods of the two component generators are relatively prime, then the period of the combined generator is proportional to the product of the component periods.

Selecting a Modulus

All the equations above need a modulus. For generation on a computer, an obvious choice would be to use the maximum word size as the modulus. Then, all of the arithmetic would be easy and quick, because the CPU hardware already takes the modulus of all full word operations.

These days, computers overwhelmingly have binary architectures, with word sizes of either 32-bits or 64-bits. In older days, there were 16-bit machines and some 35-bit and 60-bit mainframes. Regardless of the bit-size of the computer, the easy choice for the modulus would be 2^(bit-size).

Unfortunately, all choices of the other parameters for the equations yield unsatisfactory results with the modulus being a power of 2. We will show some of the failed Quality tests for a few such choices. Only the high-order bits of the resulting generation are reasonably random. The low-order bits are woefully bad.

If we choose a Prime Number for the modulus, however, all of the bits are good in the resulting generation. This is a clear choice, then, but it does uncover the next problem. The modulus must be less than the word size.

Getting to a Full Word

Choosing the modulus to be less than the word size makes it impossible to generate a full word of random data directly. [The reader may note that the rand() function specification in the Standard C-Library returns only 31-bits of random data. Of course, it has other dreadful problems.]

Actually, we have tried using double-word arithmetic with moduli bigger than the word size, to get the full word of random data in one calculation. Setting this up carries enough overhead, however, that the empirical result is that it take 50% longer to do this than to merge two results obtained in sequence from single-word arithmetic.

We have to merge two smaller results into a full word, then. If we have a 32-bit machine and a modulus nearly the word size, then we get about 31-bits of random data from each component. In order to fill out a full 32-bit word, we must merge two partial words by shifting one of them up. To be symmetric, we also shift the other number down by the same amount. To be certain of the coverage of the 32-bits, we shift up and down by 3-bits, which lets some of the data spill over both edges. The "merging" is performed as a "Exclusive-OR" (XOR) operation.

If we were to use a single generator, we would not want to merge together successively generated partials to make the final full word. The merging relies on the assumption that the component partial words are independent, which is clearly not true for immediate recursion. Here, using the Shuffling technique on the second partial word before merging would loosen its dependence on the first word.

In practice, however, the cost of running two independent generators in parallel is barely measurable, and this is what we do in the end. Nevertheless, we mention it here because we run Quality tests on each component of our final generator, and this shuffling is necessary to measure the component all by itself.

This concern does not exist, of course, in the final combined Z-generator, because the final full word is the merger of a partial word from the Z-generator of the "First Pair" with one from the "Second Pair". Since all of the generating parameters are different for the two pairs, there are no correlations between these components.

Cryptographic Digests

One of the more modern developments is the designing of complicated transformation functions for data encryption. Some of these transformations are invertible, and these can be used to encrypt data for later decryption and use by a party who knows the secret password. The old Data Encryption Standard (DES) from IBM and the new Advanced Encryption Standard (AES) from the National Institute of Standards and Technology are of this type of transformation.

Another class of transformations take a whole pile of data and "digest it", spitting out a small unit of data. This type of transformation is Many-to-One, and is, therefore, not invertible. The ubiquitous MD5 digest from RSA Security, Inc., and the newer Secure Hash Algorithm (SHA-1) from the National Institute of Standards and Technology are both of this type of transformation.

It is relatively common to run random data through one of these transformations to whiten it. Using any of these cryptographic functions to modify the final output should make the data impossible to distinguish from true noise. Of course, we test that this at least doesn't make things worse. We also test the time that this step adds to the final generation. It adds a little.

Seeding the Generators

All of the generation techniques are "feedback" loops. The next word is generated by operating on the previous word. The loop has to be started by choosing some "seed" value. This means that if it starts from the same place each time, the generator will always produce exactly the same stream of "random" data.

This fact is gainfully employed in the sciences where one uses a computer to simulate the performance of machines during the design phase. Any failures that are detected can be reproduced easily to track down where the trouble originates.

For use in computer security, however, a generator must be started in a new place each time, and it has to be a place that no one "outside" can guess. One needs a source of random data to seed the Random Number Generator.

Typically, one chooses a seed value from a "physical" generator. The ideal source is something like monitoring the potential drop across a resister or the charge on a capacitor in an electronic circuit. The least significant digits will vary with temperature, radio waves, magnetic fields as people walk by, etc.

Of course, it is more common to use something easier to obtain in an extant computer. One can use the least significant digits of the system clock, as these vary with temperature, system load, etc. These have the disadvantage that they can be seen by anyone on the system. For attended operation, it has proven effective to measure the time differentials of keyclicks of a real person typing on the keyboard. This is rarely available on a Server, however.

Another technique is to perform tasks in several threads of execution, where the result depends on the order in which the system parcels out CPU time-slicing. This "scheduling" dependence is unspecified on most platforms, and it is unpredictable in practice. We use this technique for generating physically random seeds. This procedure is inherently slow, but it only needs to generate the starting seed.

Our Choices

We combine two component Linear-Congruential Random Number Generators, X and Y, to form Z, using modular subtraction, as Knuth describes in his Equation-38 on page-108 of Vol-2.
	X(n+1) =  (Ax * X(n))  mod Mx
	Y(n+1) =  (Ay * X(n))  mod My
	Z(n+1) = ( X(n+1) - Y(n+1) ) mod Mx
We construct two of these Z-generators using independent parameters, which we will call the "First Pair" and the "Second Pair". The partial words from each of these are merged together by shifting the partial word from the "First Pair" down by 3-bits, shifting the partial word from the "Second Pair" up by 3-bits, and XOR-ing them together.

All the moduli are Prime Numbers. We pick A values which are of the order of magnitude of the square-root of the modulus, which allows some speedups in the arithmetic. The A values have been selected from values in Knuth's Volume-2, which have been tested extensively in the literature to perform well in the Spectral Tests of the randomness distributions. Since we are subtracting, the C parameters are less relevant, and we set them both to zero in the component generators, X and Y.

The "First Set" of parameters, then, are the following:

First  Pair X:  M = (2^31 -   1) = 2147483647, A = 48271
First  Pair Y:  M = (2^31 - 249) = 2147483399, A = 40692

Second Pair X:  M = (2^31 -  61) = 2147483587, A = 32647
Second Pair Y:  M = (2^31 - 171) = 2147483477, A = 16807

We choose three distinct sets of parameters, each with a unique choice of Prime Number for the Modulus. Which set is used is determined by the initial seed value during the initialization of the generator.

The "Second Set" of parameters, are the following:

First  Pair X:  M = (2^31 -  69) = 2147483579, A = 48271
First  Pair Y:  M = (2^31 -  19) = 2147483629, A = 40692

Second Pair X:  M = (2^30 - 135) = 1073741689, A = 32647
Second Pair Y:  M = (2^31 -  85) = 2147483563, A = 16807

The "Third Set" of parameters, are the following:

First  Pair X:  M = (2^31 -  99) = 2147483549, A = 48271
First  Pair Y:  M = (2^31 - 105) = 2147483543, A = 40692

Second Pair X:  M = (2^31 - 151) = 2147483497, A = 32647
Second Pair Y:  M = (2^31 - 159) = 2147483489, A = 16807

For the random_bytes program, we whiten the output by filling a block of 16 bytes with it and running it through the MD5 MAC digest.

Applying all of these techniques together is arguably way overkill. However, this program is used by humans and needs to run, therefore, only at Human Timescales. We have plenty of time to go through the excess randomization.

Test Results

We have performed extensive tests on our Random Number Generators. We have tested the components individually, as well as the combined generators. To see these results, continue on to the Randomness Validation page.


webmaster@HEPArts.com

Copyright © 2005 HEPArts, Inc. All Rights Reserved.