This course is a short series of lectures on Introductory Statistics. Topics
covered are listed in the Table of Contents. The notes were prepared by EwaPaszek and Marek Kimmel.
The development of this course has been supported by NSF 0203396 grant.
In this paragraph, our goals will be to look at, in more detail, how and whether particular types of pseudo-random variable generators work, and how, if necessary, we can implement a generator of our own choosing. Below a list of requirements is listed for our uniform random variable generator:
- A uniform marginal distribution,
- Independence of the uniform variables,
- Repeatability and portability,
- Computational speed.
Current algorithms
The generation of pseudo-random variates through algorithmic methods is a mature field in the sense that a great deal is known theoretically about different classes of algorithms, and in the sense that particular algorithms in each of those classes have been shown, upon testing, to have good statistical properties. In this section, let describe the main classes of generators, and then let
make specific recommendation about which generators should be implemented.
Congruential Generators
The most widely used and best understood class of pseudo-random number generators are those based on the linear congruential method introduced by
Lehmer (1951) . Such generators are based on the following formula:
${U}_{i}=\left(a{U}_{i-1}+c\right)\mathrm{mod}m,$
where
${U}_{i},i=\mathrm{1,2,...}$ are the output random integers;
${U}_{0}$ is the chosen starting value for the recursion, called
the seed and
a ,
c , and
m are prechosen constants.
to convert to uniform
$\left(\mathrm{0,1}\right)$ variates, we need only divide by
modulus m , that is, we use the sequence
$\left\{{U}_{i}/m\right\}$ .
The following properties of the algorithm are worth stating explicitly:
- Because of the “mod m” operation (for background on modular operations, see
Knuth, (1981) ), the only possible values the algorithm can produce are the integers
$\mathrm{0,1,2,...,}m-1.$ This follows because, by definition,
x mod
m is the remainder after
x is divided by
m .
- Because the current random integer
${U}_{i}$ depends only on the previous random integer
${U}_{i-1}$ once a previous value has been repeated, the entire sequence after it must be repeated. Such a repeating sequence is called
a cycle , and its
period is
the cycle length . Clearly,
the maximum period of the congruential generator is
m . For given choices of
a ,
c , and
m , a generator may contain many short cycles, (see the Example 1 below), and the cycle you enter will depend on the seed you start with. Notice that the generator with many short cycles is not a good one, since the output sequence will be one of a number of short series, each of which may not be uniformly distributed or randomly dispersed on the line or the plane. Moreover, if the simulation is long enough to cause the random numbers to repeat because of the short cycle length, the outputs will not be independent.
- If we are concern with a uniform
$\left(\mathrm{0,1}\right)$ variates, the finest partition of the interval
$\left(\mathrm{0,1}\right)$ that this generator can provide is
$\left[\mathrm{0,1}/m\mathrm{,2}/m\mathrm{,...,}\left(m-1/m\right)\right]$ . This is, of course, not truly a uniform
$\left(\mathrm{0,1}\right)$ distribution since, for any
k in
$\left(\mathrm{0,}m-1\right)$ , we have
$P\left[k/m<U<\left(k+1\right)/m\right]=0$ , not
$1/m$ are required by theory for continuous random variables.
- Choices of
a ,
c , and
m , will determine not only the fineness of the partition of
$\left(\mathrm{0,1}\right)$ and the cycle length, and therefore, the uniformity of the marginal distribution, but also the independence properties of the output sequence. Properly choosing
a ,
c , and
m is a science that incorporates both theoretical results and empirical tests. The first rule is to select the modulus m to be “as large as possible”, so that there is some hope to address point 3 above and to generate uniform variates with an approximately uniform marginal distribution. However, simply having m large is not enough; one may still find that the generator has many short cycles, or that the sequence is not approximately independent. See
example 1 below.