Expectation of a Function With Respect to a Probability Law

Consider a numerical valued random phenomenon, with probability function . The probability function determines a distribution of a unit mass on the real line, the amount of which lying on any (Borel) set of real numbers is equal to . In order to summarize the characteristics of by a few numbers, we define in this section the notion of the expectation of a continuous function of a real variable , with respect to the probability function , to be denoted by . It will be seen that the expectation has much the same properties as the average of , with respect to a set of numbers.

For the case in which the probability function is specified by a probability mass function , we define, in analogy with (1.12) ,

The sum written in (2.1) may involve the summation of a countably infinite number of terms and therefore is not always meaningful. For reasons made clear in section 1 of Chapter 8 the expectation is said to exist if In words, the expectation , defined in (2.1), exists if and only if the infinite series defining is absolutely convergent . A test for convergence of an infinite series is given in theoretical exercise 2.1 .

For the case in which the probability function is specified by a probability density function , we define  

The integral written in (2.3) is an improper integral and therefore is not always meaningful. Before one can speak of the expectation , one must verify its existence. The expectation is said to exist if In words, the expectation defined in exists if and only if the improper integral defining is absolutely convergent . In the case in which the functions and are continuous for all (but a finite number of values of) , the integral in (2.3) may be defined as an improper Riemann 1 integral by the limit

A useful tool for determining whether or not the expectation , given by (2.3) , exists is the test for convergence of an improper integral given in theoretical exercise 2.1 .

A discussion of the definition of the expectation in the case in which the probability function must be specified by the distribution function is given in section 6 .

The expectation is sometimes called the ensemble average of the function in order to emphasize that the expectation (or ensemble average) is a theoretically computed quantity. It is not an average of an observed set of numbers, as was the case in section 1 . We shall later consider averages with respect to observed values of random phenomena, and these will be called sample averages.

A special terminology is introduced to describe the expectation of various functions .

We call , the expectation of the function with respect to a probability law, the mean of the probability law . For a discrete probability law, with probability mass function ,

For a continuous probability law, with probability density function ,

It may be shown that the mean of a probability law has the following meaning. Suppose one makes a sequence of independent observations of a random phenomenon obeying the probability law and forms the successive arithmetic means

These successive arithmetic means, , will (with probability one) tend to a limiting value if and only if the mean of the probability law is finite. Further, this limiting value will be precisely the mean of the probability law.

We call , the expectation of the function with respect to a probability law, the mean square of the probability law . This notion is not to be confused with the square mean of the probability law , which is the square of the mean and which we denote by . For a discrete probability law, with probability mass function ,

For a continuous probability law, with probability density function ,

More generally, for any integer , we call , the expectation of with respect to a probability law, the nth moment of the probability law . Note that the first moment and the mean of a probability law are the same; also, the second moment and the mean square of a probability law are the same.

Next, for any real number , and integer , we call the nth moment of the probability law about the point . Of especial interest is the case in which is equal to the mean . We call the nth moment of the probability law about its mean or, more briefly, the nth central moment of the probability law. 

The second central moment is especially important and is called the variance of the probability law . Given a probability law, we shall use the symbols and to denote, respectively, its mean and variance; consequently,

The square root of the variance is called the standard deviation of the probability law. The intuitive meaning of the variance is discussed in section 4 .

Example 2A . The normal probability law with parameters and is specified by the probability density function , given by (4.11) of Chapter 4. Its mean is equal to where we have made the change of variable . Now

Equation (2.12) follows from (2.20) and (2.22) of Chapter 4 and the fact that for any integrable function

From (2.12) and (2.13) it follows that the mean is equal to . Next, the variance is equal to Notice that the parameters and in the normal probability law were chosen equal to the mean and standard deviation of the probability law. 

The operation of taking expectations has certain basic properties with which one may perform various formal manipulations. To begin with, we have the following properties for any constant and any functions , and whose expectations exist: In words, the first three of these properties may be stated as follows: the expectation of a constant [that is, of the function , which is equal to for every value of ] is equal to ; the expectation of the product of a constant and a function is equal to the constant multiplied by the expectation of the function; the expectation of a function which is the sum of two functions is equal to the sum of the expectations of the two functions.

Equations (2.15) to (2.19) are immediate consequences of the definition of expectation. We write out the details only for the case in which the expectations are taken with respect to a continuous probability law with probability density function . Then, by the properties of integrals,

Equation (2.19) follows from (2.18) , applied first with and and then with and .

Example 2B . To illustrate the use of (2.15) to (2.19) , we note that , and .

We next derive an extremely important expression for the variance of a probability law:

 

In words, the variance of a probability law is equal to its mean square, minus its square mean . To prove (2.20) , we write, letting ,

In the remainder of this section we compute the mean and variance of various probability laws. A tabulation of the results obtained is given in Tables 3A and 3B at the end section 3.

Example 2C . The Bernoulli probability law with parameter , in which , is specified by the probability mass function , given by for or 1. Its mean, mean square, and variance, letting , are given by

Example 2D . The binomial probability law with parameters and is specified by the probability mass function given by (4.5) of Chapter 4. Its mean is given by

Its mean square is given by

To evaluate , we write . Then

Since , the sum in (2.24) is equal to

Consequently, , so that

Example 2E . The hypergeometric probability law with parameters , and is specified by the probability mass function given by (4.8) of Chapter 4. Its mean is given by in which we have let . Now, letting and using (2.37) of Chapter 4, the last sum written is equal to Consequently, Next, we evaluate by first evaluating and then using the fact that . Now

Notice that the mean of the hypergeometric probability law is the same as that of the corresponding binomial probability law, whereas the variances differ by a factor that is approximately equal to 1 if the ratio is a small number.

Example 2F . The uniform probability law over the interval to has probability density function given by (4.10) of Chapter 4. Its mean, mean square, and variance are given by

Note that the variance of the uniform probability law depends only on the length of the interval, whereas the mean is equal to the mid-point of the interval. The higher moments of the uniform probability law are also easily obtained:

Example 2G . The Cauchy probability law with parameters and is specified by the probability density function

The mean of the Cauchy probability law does not exist, since

However, for the th absolute moments

do exist, as one may see by applying theoretical exercise 2.1.

Theoretical Exercises

2.1 . Test for convergence or divergence of infinite series and improper integrals . Prove the following statements. Let be a continuous function. If, for some real number , the limits

both exist and are finite, then

 

converge absolutely; if, for some , either of the limits in (2.34) exist and is not equal to 0, then the expressions in (2.35) fail to converge absolutely.

2.2 . Pareto’s distribution with parameters and , in which and are positive, is defined by the probability density function

Show that Pareto’s distribution possesses a finite th moment if and only if . Find the mean and variance of Pareto’s distribution in the cases in which they exist.

2.3 . “Student’s” -distribution with parameter is defined as the continuous probability law specified by the probability density function

Note that “Student’s” -distribution with parameter coincides with the Cauchy probability law given by (2.31). Show that for “Student’s” -distribution with parameter (i) the th moment exists only for , (ii) if and is odd, then , (iii) if and is even, then

Hint: Use (2.41) and (2.42) in Chapter 4.

2.4 . A characterization of the mean . Consider a probability law with finite mean . Define, for every real number . Show that . Consequently is minimized at , and its minimum value is the variance of the probability law.

2.5 . A geometrical interpretation of the mean of a probability law . Show that for a continuous probability law with probability density function and distribution function

Consequently the mean of the probability law may be written

These equations may be interpreted geometrically. Plot the graph of the distribution function on an -plane, as in Fig. 2A, and define the areas I and II as indicated: is the area to the right of the -axis bounded by and ; II is the area to the left of the -axis bounded by and . Then the mean is equal to area I, minus area II. Although we have proved this assertion only for the case of a continuous probability law, it holds for any probability law.

Figure 2.4.1

Fig. 2A . The mean of a probability law with distribution function is equal to the shaded area to the right of the -axis, minus the shaded area to the left of the -axis.

2.6 . A geometrical interpretation of the higher moments . Show that the th moment of a continuous probability law with distribution function can be expressed for

Use (2.41) to interpret the th moment in terms of area.

2.7 . The relation between the moments and central moments of a probability law . Show that from a knowledge of the moments of a probability law one may obtain a knowledge of the central moments, and conversely. In particular, it is useful to have expressions for the first 4 central moments in terms of the moments. Show that

2.8 . The square mean is less than or equal to the mean square . Show that

Give an example of a probability law whose mean square is equal to its square mean.

2.9 . The mean is not necessarily greater than or equal to the variance . The binomial and the Poisson are probability laws having the property that their mean is greater than or equal to their variance (show this); this circumstance has sometimes led to the belief that for the probability law of a random variable assuming only nonnegative values it is always true that . Prove this is not the case by showing that for the probability law of the number of failures up to the first success in a sequence of independent repeated Bernoulli trials.

2.10 . The median of a probability law . The mean of a probability law provides a measure of the “mid-point” of a probability distribution. Another such measure is provided by the median of a probability law , denoted by , which is defined as a number such that

If the probability law is continuous, the median may be defined as a number satisfying . Thus is the projection on the -axis of the point in the -plane at which the line intersects the curve . A more probabilistic definition of the median is as a number such that , in which is an observed value of a random phenomenon obeying the given probability law. There may be an interval of points that satisfies (2.44) ; if this is the case, we take the mid-point of the interval as the median. Show that one may characterize the median as a number at which the function achieves its minimum value; this is therefore . Hint: Although the assertion is true in general, show it only for a continuous probability law. Show, and use the fact, that for any number

2.11 . The mode of a continuous or discrete probability law . For a continuous probability law with probability density function a mode of the probability law is defined as a number at which the probability density has a relative maximum; assuming that the probability density function is twice differentiable, a point is a mode if and . Since the probability density function is the derivative of the distribution function , these conditions may be stated in terms of the distribution function: a point is a mode if and . Similarly, for a discrete probability law with probability mass function a mode of the probability law is defined as a number at which the probability mass function has a relative maximum; more precisely, for equal to the largest probability mass point less than and for equal to the smallest probability mass point larger than . A probability law is said to be (i) unimodal if it possesses just 1 mode, (ii) bimodal if it possesses exactly 2 modes, and so on. Give examples of continuous and discrete probability laws which are (a) unimodal, (b) bimodal. Give examples of continuous and discrete probability laws for which the mean, median, and mode ( ) coincide, are all different.

2.12 . The interquartile range of a probability law . Possible measures exist of the dispersion of a probability distribution, in addition to the variance, which one may consider (especially if the variance is infinite). The most important of these is the interquartile range of the probability law, defined as follows: for any number , between 0 and 1, define the percentile of the probability law as the number satisfying . Thus is the projection on the -axis of the point in the -plane at which the line intersects the curve . The 0.5 percentile is usually called the median. The interquartile range, defined as the difference , may be taken as a measure of the dispersion of the probability law. 
(i) Show that the ratio of the interquartile range to the standard deviation is (a), for the normal probability law with parameters and , (b), for the exponential probability law with parameter , (c), for the uniform probability law over the interval to
(ii) Show that the Cauchy probability law specified by the probability density function possesses neither a mean nor a variance. However, it possesses a median and an interquartile range given by .

Exercises

In exercises 2.1 to 2.7, compute the mean and variance of the probability law specified by the probability density function, probability mass function, or distribution function given.

2.1 .

 

Answer

Mean (i) , (ii) 0 , (iii) ; variance (i) , (ii) , (iii) .

 

2.2 .

2.3 .

 

Answer

Mean (i) does not exist, (ii) 0, (iii) 0; variance (i) does not exist, (ii) 3, (iii) 1.

 

2.4 .

2.5 .

 

Answer

Mean (i) , (ii) 4 (iii) 4; variance (i) , (ii) , (iii) .

 

2.6 .

2.7 .

 

Answer

Mean (i) , (ii) ; variance (i) , (ii) .

 

2.8 . Compute the means and variances of the probability laws obeyed by the numerical valued random phenomena described in exercise 4.1 of Chapter 4.

2.9 . For what values of does the probability law, specified by the following probability density function, possess (i) a finite mean, (ii) a finite variance:

 

Answer

(i) ; (ii) .

 


  1. For the benefit of the reader acquainted with the theory of Lebesgue integration, let it be remarked that if the integral in (2.3) is defined as an integral in the sense of Lebesgue then the notion of expectation may be defined for a Borel function . ↩︎