Convergence of random variables |
In probability theory, there exist several different notions
of convergence of random variables. The convergence (in one of the senses presented below) of sequences of random variables to some limiting random variable is an
important concept in probability theory, and its applications to statistics and
stochastic processes. For example, if the average of n
independent, identically distributed random variables
Yi, i = 1, ..., n, is given by
-
then as n goes to infinity, Xn converges in probability (see below) to the
common mean, μ, of the random variables Yi. This result
is known as the weak law of large numbers. Other forms of
convergence are important in other useful theorems, including the central limit theorem.
Throughout the following, we assume that (Xn) is a sequence of random variables, and X
is a random variable, and all of them are defined on the same probability space (Ω, F, P).
Convergence in distribution
We say that the sequence Xn converges towards X in distribution, if
-
for every real number a at which the cumulative distribution function of the
limiting random variable X is continuous. Essentially, this means that
the probability that the value of X is in a given range is very similar to the probability that the value of
Xn is in that range, if only n is large enough. This notion of convergence is used in the
central limit theorems.
Convergence in distribution is the weakest form of convergence (it is sometimes called weak convergence), and
does not, in general, imply any other mode of convergence. However, convergence in distribution is implied by all other
modes of convergence mentioned in this article, and hence, it is the most common and often the most useful form of convergence of
random variables.
A useful result, which may be employed in conjunction with laws of large numbers and the central limit theorem, is that if a function g:
R → R is continuous, then if Xn converges
in distribution to X, then so too does g(Xn) converge in
distribution to g(X). (This may be proved using Skorokhod's representation theorem.)
Convergence in distribution is also called convergence in law, since the word "law" is sometimes used as a
synonym of "probability distribution."
Convergence in probability
We say that the sequence Xn converges towards X in probability if
-
for every ε > 0. Convergence in probability is, indeed, the (pointwise) convergence of probabilities. Pick any ε > 0 and any δ > 0. Let
Pn be the probability that Xn is outside a tolerance ε of
X. Then, if Xn converges in probability to X then there exists a value N
such that, for all n ≥ N, Pn is itself less than δ.
Convergence in probability is the notion of convergence used in the weak law of large numbers. Convergence in probability implies convergence in distribution. To prove it,
it's convenient to prove the following, simple lemma:
Lemma
Be X, Y random variables, c a real number and ε > 0; then
-
In fact,
-
-
-
since
-
Proof
For every ε > 0, due to the preceeding lemma, we have:
-
-
So, we have:
-
Taking the limit for , we
obtain:
-
But is the cumulative distribution function
FX(a), which is continuous by hypothesis, that is:
-
and so, taking the limit for , we obtain:
-
Almost sure convergence
We say that the sequence Xn converges almost surely or almost
everywhere or with probability 1 or strongly towards X if
-
This means that you are virtually guaranteed that the values of Xn approach the value of
X, in the sense (see almost surely) that events for which
Xn does not converge to X have probability 0. Using the probability space (Ω,
F, P) and the concept of the random variable as a function from Ω to R, this is equivalent to the
statement
-
Almost sure convergence implies convergence in probability, and hence implies convergence in distribution. It is the notion of
convergence used in the strong law of large numbers.
Convergence in rth mean
We say that the sequence Xn converges in rth mean or in the
Lr norm towards X, if r ≥ 1,
E|Xn| < ∞ for all n, and
-
where the operator E denotes the expected value. Convergence in
rth mean tells us that the expectation of the rth power of the difference between
Xn and X converges to zero.
The most important cases of convergence in rth mean are:
- When Xn converges in rth mean to X for r = 1, we say that
Xn converges in mean to X.
- When Xn converges in rth mean to X for r = 2, we say that
Xn converges in mean square to X.
Convergence in rth mean, for r ≥ 1, implies convergence in probability (by Chebyshev's inequality), while if r > s
≥ 1, convergence in rth mean implies convergence in sth mean. Hence, convergence in mean square implies
convergence in mean.
Converse implications
The chain of implications between the various notions of convergence, above, are noted in their respective sections, but it is
sometimes important to establish converses to these implications. No other implications other than those noted above hold in
general, but a number of special cases do permit converses:
- If Xn converges in distribution to a constant c, then Xn
converges in probability to c.
- If Xn converges in probability X, and if Pr(|Xn| ≤
b) = 1 for all n and some b, then Xn converges in rth mean to
X for all r ≥ 1. In other words, if Xn converges in probability to
X and all random variables Xn are almost surely bounded above and below, then
Xn converges to X also in any rth mean.
-
-
- then Xn converges almost surely to X. In other words, if
Xn converges in probability to X sufficiently quickly (i.e. the above sum
converges for all ε > 0), then Xn also converges almost surely to X.
References
- G.R. Grimmett and D.R. Stirzaker (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford, pp
271--285. ISBN 0198536658.
|