The Law of Large Numbers

The fundamental empirical fact upon which are based all applications of the theory of probability is expressed in the empirical law of large numbers, first formulated by Poisson (in his book, Recherches sur le probabilité des jugements , 1837):

In many different fields, empirical phenomena appear to obey a certain general law, which can be called the Law of Large Numbers. This law states that the ratios of numbers derived from the observation of a very large number of similar events remain practically constant, provided that these events are governed partly by constant factors and partly by variable factors whose variations are irregular and do not cause a systematic change in a definite direction. Certain values of these relations are characteristic of each given kind of event. With the increase in length of the series of observations the ratios derived from such observations come nearer and nearer to these characteristic constants. They could be expected to reproduce them exactly if it were possible to make series of observations of an infinite length.

In the mathematical theory of probability one may prove a proposition, called the mathematical law of large numbers, that may be used to gain insight into the circumstances under which the empirical law of large numbers is expected to hold. For an interesting philosophical discussion of the relation between the empirical and the mathematical laws of large numbers and for the foregoing quotation from Poisson the reader should consult Richard von Mises, Probability, Statistics, and Truth , second revised edition, Macmillan, New York, 1957, pp. 104–134.

A sequence of jointly distributed random variables, , with finite means, is said to obey the (classical) law of large numbers if

in some mode of convergence as tends to . The sequence is said to obey the strong law of large numbers, the weak law of large numbers, or the quadratic mean law of large numbers, depending on whether the convergence in (2.1) is with probability one, in probability, or in quadratic mean. In this section we give conditions, both for independent and dependent random variables, for the law of large numbers to hold.

We consider first the case of independent random variables with finite means. We prove in section 3 that a sequence of independent identically distributed random variables obeys the weak law of large numbers if the common mean is finite. It may be proved (see Loève, Probability Theory , Van Nostrand, New York, 1955, p. 243) that the finiteness of also implies that the sequence of independent identically distributed random variables obeys the strong law of large numbers.

In theoretical exercise 4.2 we indicate the proof of the law of large numbers for independent, not necessarily identically distributed, random variables with finite means: if, for some

then

Equation (2.2) is known as Markov’s condition for the validity of the weak law of large numbers for independent random variables.

In this section we consider the case of dependent random variables , with finite means (which we may take to be 0), and finite variances . We state conditions for the validity of the quadratic mean law of large numbers and the strong law of large numbers, which, while not the most general conditions that can be stated, appear to be general enough for most practical applications. Our conditions are stated in terms of the behavior, as tends to , of the covariance

between the th summand and the th sample mean

Let us examine the possible behavior of under various assumptions on the sequence and under the assumption that the variances are uniformly bounded; that is, there is a constant such that

If the random variables are independent, then if . Consequently, , which, under condition (2.5), tends to 0 as tends to . This is also the case if the random variables are assumed to be orthogonal. The sequence of random variables is said to be orthogonal if, for any integer and integer . Then, again, .

More generally, let us consider random variables that are stationary (in the wide sense); this means that there is a function , defined for , such that, for any integers and ,

It is clear that an orthogonal sequence of random variables (in which all the random variables have the same variance ) is stationary, with or 0, depending on whether or . For a stationary sequence the value of is given by

We now show that under condition (2.5) a necessary and sufficient condition for the sample mean to converge in quadratic mean to 0 is that tends to 0. In theorem 2B we state conditions for the sample mean to converge with probability one to 0.

Theorem 2A. A sequence of jointly distributed random variables with zero mean and uniformly bounded variances obeys the quadratic mean law of large numbers (in the sense that ) if and only if

 

Proof

Since , it is clear that if the quadratic mean law of large numbers holds and if the variances are bounded uniformly in , then (2.8) holds. To prove the converse, we prove first the following useful identity:

 

To prove (2.9), we write the familiar formula

from which (2.9) follows by dividing through by . In view of (2.9), to complete the proof that (2.8) implies tends to 0, it suffices to show that (2.8) implies

To see (2.11), note that for any

Letting first tend to infinity and then tend to in (2.12), we see that (2.11) holds. The proof of theorem 2A is complete.

If it is known that tends to 0 as some power of , then we can conclude that convergence holds with probability one.

Theorem 2B. A sequence of jointly distributed random variables with zero mean and uniformly bounded variances obeys the strong law of large numbers (in the sense that ) if positive constants and exist such that for all integers

Remark: For a stationary sequence of random variables [in which case is given by (2.7)] (2.13) holds if positive constants and exist such that for all integers

 

Proof

If (2.13) holds, then (assuming, as we may, that )

 

By (2.15) and (2.9), it follows that for some constant and

Choose now any integer such that and define a sequence of random variables by taking for the th member of the sequence ; in symbols,

By (2.16), the sequence has a mean square satisfying

If we sum (2.18) over all , we obtain a convergent series, since :

Therefore, by theorem 1A, it follows that

We have thus shown that a properly selected subsequence of the sequence converges to 0 with probability one. We complete the proof of theorem by showing that the members of the sequence , located between successive members of the subsequence , do not tend to be too different from the members of the subsequence. More precisely, define

We claim it is clear, in view of (2.20), that to show that it suffices to show that . Consequently, to complete the proof it suffices to show that

In view of theorem 1A, to show that (2.22) holds, it suffices to show that

We prove that (2.23) holds by showing that for some constants and

To prove (2.24), we note that

from which it follows that

in which we use as a bound for . By a calculus argument, using the law of the mean, one may show that for and

Consequently, (2.26) implies the first part of (2.24). Similarly,

from which one may infer the second part of (2.24). The.proof of theorem 2B is now complete.

Exercises

2.1. Random digits. Consider a discrete random variable uniformly distributed over the numbers 0 to for any integer ; that is, if . Let be a sequence of independent random variables identically distributed as . For an integer from 0 to define as the fraction of the observations equal to . Prove that

2.2. The distribution of digits in the decimal expansion of a random number. Let be a number chosen at random from the unit interval (that is, is a random variable uniformly distributed over the interval 0 to 1). Let be the successive digits in the decimal expansion of ; that is,

Prove that the random variables are independent discrete random variables uniformly distributed over the integers 0 to 9. Consequently, conclude that for any integer (say, the integer 7) the relative frequency of occurrence of in the decimal expansion of any number in the unit interval is equal to for all numbers , except a set of numbers constituting a set of probability zero. Does the fact that only 3’s occur in the decimal expansion of contradict the assertion?

2.3. Convergence of the sample distribution function and the sample characteristic function of dependent random variables . Let be a sequence of random variables identically distributed as a random variable . The sample distribution function is defined as the fraction of observations among which are less than or equal to . The sample characteristic function is defined by

Show that converges in quadratic mean to , as , if and only if

Show that converges in quadratic mean to if and only if

Prove that (2.30) and (2.31) hold if the random variables are independent.

2.4. The law of large numbers does not hold for Cauchy distributed random variables. Let be a sequence of independent identically distributed random variables with probability density functions . Show that no finite constant exists to which the sample means converge in probability.

2.5. Let be a sequence of independent random variables identically distributed as a random variable with finite mean. Show that for any bounded continuous function of a real variable

Consequently, conclude that

2.6. A probabilistic proof of Weierstrass’ theorem: Extend (2.34) to show that to any continuous function on the interval there exists a sequence of polynomials such that uniformly on .