Lecture 2
Recall: a random variable is a measurable function X \colon \Omega \to \mathbb{R}\,, \quad \Omega \,\, \text{ sample space}
A random vector is a measurable function \mathbf{X}\colon \Omega \to \mathbb{R}^n. We say that
The components of a random vector \mathbf{X} are denoted by \mathbf{X}= (X_1, \ldots, X_n) with X_i \colon \Omega \to \mathbb{R} random variables
We denote a two-dimensional bivariate random vector by (X,Y) with X,Y \colon \Omega \to \mathbb{R} random variables
Notation: P(X=x, Y=y ) := P( \{X=x \} \cap \{ Y=y \})
The joint pmf can be used to compute the probability of A \subset \mathbb{R}^2 \begin{align*} P((X,Y) \in A) & := P( \{ \omega \in \Omega \colon ( X(\omega), Y(\omega) ) \in A \} ) \\ & = \sum_{(x,y) \in A} f_{X,Y} (x,y) \end{align*}
In particular we obtain \sum_{(x,y) \in \mathbb{R}^2} f_{X,Y} (x,y) = 1
Note: The marginal pmfs of X and Y are just the usual pmfs of X and Y
Marginals of X and Y can be computed from the joint pmf f_{X,Y}
To compute joint pmf one needs to consider all the cases f_{X,Y}(x,y) = P(X=x,Y=y) \,, \quad (x,y) \in \mathbb{R}^2
For example X=4 and Y=0 is only obtained for (2,2). Hence f_{X,Y}(4,0) = P(X=4,Y=0) = P(\{(2,2)\}) = \frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36}
Similarly X=5 and Y=2 is only obtained for (4,1) and (1,4). Thus f_{X,Y}(5,2) = P(X=5,Y=2) = P(\{(4,1)\} \cup \{(1,4)\}) = \frac{1}{36} + \frac{1}{36} = \frac{1}{18}
f_{X,Y}(x,y)=0 for most of the pairs (x,y). Indeed f_{X,Y}(x,y)=0 if x \notin X(\Omega) \quad \text{ or } \quad y \notin Y(\Omega)
We have X(\Omega)=\{2,3,4,5,6,7,8,9,10,11,12\}
We have Y(\Omega)=\{0,1,2,3,4,5\}
Hence f_{X,Y} only needs to be computed for pairs (x,y) satisfying 2 \leq x \leq 12 \quad \text{ and } \quad 0 \leq y \leq 5
Within this range, other values will be zero. For example f_{X,Y}(3,0) = P(X=3,Y=0) = P(\emptyset) = 0
Below are all the values for f_{X,Y}. Empty entries correspond to f_{X,Y}(x,y) = 0
x | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
0 | 1/36 | 1/36 | 1/36 | 1/36 | 1/36 | 1/36 | ||||||
1 | 1/18 | 1/18 | 1/18 | 1/18 | 1/18 | |||||||
y | 2 | 1/18 | 1/18 | 1/18 | 1/18 | |||||||
3 | 1/18 | 1/18 | 1/18 | |||||||||
4 | 1/18 | 1/18 | ||||||||||
5 | 1/18 |
We can use the non-zero entries in the table for f_{X,Y} to compute: \begin{align*} {\rm I\kern-.3em E}[XY] & = 3 \cdot 1 \cdot \frac{1}{18} + 5 \cdot 1 \cdot \frac{1}{18} + 7 \cdot 1 \cdot \frac{1}{18} + 9 \cdot 1 \cdot \frac{1}{18} + 11 \cdot 1 \cdot \frac{1}{18} \\ & + 4 \cdot 2 \cdot \frac{1}{18} + 6 \cdot 2 \cdot \frac{1}{18} + 8 \cdot 2 \cdot \frac{1}{18} + 10\cdot 2 \cdot \frac{1}{18} \\ & + 5 \cdot 3 \cdot \frac{1}{18} + 7 \cdot 3 \cdot \frac{1}{18} + 9 \cdot 3 \cdot \frac{1}{18} \\ & + 6 \cdot 4 \cdot \frac{1}{18} + 8 \cdot 4 \cdot \frac{1}{18} \\ & + 7 \cdot 5 \cdot \frac{1}{18} \\ & = (35 + 56 + 63 + 56 + 35 ) \frac{1}{18} = \frac{245}{18} \end{align*}
We want to compute the marginal of Y via the formula f_Y(y) = \sum_{x \in \mathbb{R}} f_{X,Y}(x,y)
Again looking at the table for f_{X,Y}, we get \begin{align*} f_Y(0) & = f_{X,Y}(2,0) + f_{X,Y}(4,0) + f_{X,Y}(6,0) \\ & + f_{X,Y}(8,0) + f_{X,Y}(10,0) + f_{X,Y}(12,0) \\ & = 6 \cdot \frac{1}{36} = \frac{3}{18} \end{align*}
Hence the pmf of Y is given by the table below
y | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
f_Y(y) | \frac{3}{18} | \frac{5}{18} | \frac{4}{18} | \frac{3}{18} | \frac{2}{18} | \frac{1}{18} |
Note that f_Y is indeed a pmf, since \sum_{y \in \mathbb{R}} f_Y(y) = \sum_{y=0}^5 f_Y(y) = 1
Notation:The symbol \int_{\mathbb{R}^2} denotes the double integral \int_{-\infty}^\infty\int_{-\infty}^\infty
Note: The marginal pdfs of X and Y are just the usual pdfs of X and Y
Marginals of X and Y can be computed from the joint pdf f_{X,Y}
Let f \colon \mathbb{R}^2 \to \mathbb{R}. Then f is joint pmf or joint pdf of a random vector (X,Y) iff
In the above setting:
(X,Y) discrete random vector | (X,Y) continuous random vector |
---|---|
X and Y discrete | X and Y continuous |
Joint pmf | Joint pdf |
f_{X,Y}(x,y) := P(X=x,Y=y) | P((X,Y) \in A) = \int_A f_X(x,y) \,dxdy |
f_{X,Y} \geq 0 | f_{X,Y} \geq 0 |
\sum_{(x,y)\in \mathbb{R}^2} f_{X,Y}(x,y)=1 | \int_{\mathbb{R}^2} f_{X,Y}(x,y) \, dxdy= 1 |
Marginal pmfs | Marginal pdfs |
f_X (x) := P(X=x) | P(a \leq X \leq b) = \int_a^b f_X(x) \,dx |
f_Y (y) := P(Y=y) | P(a \leq Y \leq b) = \int_a^b f_Y(y) \,dy |
f_X (x)=\sum_{y \in \mathbb{R}} f_{X,Y}(x,y) | f_X(x) = \int_{\mathbb{R}} f_{X,Y}(x,y) \,dy |
f_Y (y)=\sum_{x \in \mathbb{R}} f_{X,Y}(x,y) | f_Y(y) = \int_{\mathbb{R}} f_{X,Y}(x,y) \,dx |
Suppose given a discrete random vector (X,Y)
It might happen that the event \{X=x\} depends on \{Y=y\}
If P(Y=y)>0 we can define the conditional probability P(X=x|Y=y) := \frac{P(X=x,Y=y)}{P(Y=y)} = \frac{f_{X,Y}(x,y)}{f_Y(y)} where f_{X,Y} is joint pmf of (X,Y) and f_Y the marginal pmf of Y
(X,Y) discrete random vector with joint pmf f_{X,Y} and marginal pmfs f_X, f_Y
For any x such that f_X(x)=P(X=x)>0 the conditional pmf of Y given that X=x is the function f(\cdot | x) defined by f(y|x) := P(Y=y|X=x) = \frac{f_{X,Y}(x,y)}{f_X(x)}
For any y such that f_Y(y)=P(X=y)>0 the conditional pmf of X given that Y=y is the function f(\cdot | y) defined by f(x|y) := P(X=x|Y=y) =\frac{f_{X,Y}(x,y)}{f_Y(y)}
Conditional pmf f(y|x) is indeed a pmf:
Similar reasoning yields that also f(x|y) is a pmf
Notation: We will often write
(X,Y) continuous random vector with joint pdf f_{X,Y} and marginal pdfs f_X, f_Y
For any x such that f_X(x)>0 the conditional pdf of Y given that X=x is the function f(\cdot | x) defined by f(y|x) := \frac{f_{X,Y}(x,y)}{f_X(x)}
For any y such that f_Y(y)>0 the conditional pdf of X given that Y=y is the function f(\cdot | y) defined by f(x|y) := \frac{f_{X,Y}(x,y)}{f_Y(y)}
The conditional distribution Y|X is therefore exponential f(y|x) = \begin{cases} e^{-(y-x)} & \text{ if } y > x \\ 0 & \text{ if } y \leq x \end{cases}
The conditional expectation of Y given X=x is \begin{align*} {\rm I\kern-.3em E}[Y|x] & = \int_{-\infty}^\infty y f(y|x) \, dy = \int_{x}^\infty y e^{-(y-x)} \, dy \\ & = -(y+1) e^{-(y-x)} \bigg|_{x}^\infty = x + 1 \end{align*} where we integrated by parts
Therefore conditional expectation of Y given X=x is {\rm I\kern-.3em E}[Y|x] = x + 1
This can also be interpreted as the random variable {\rm I\kern-.3em E}[Y|X] = X + 1
The conditional second moment of Y given X=x is \begin{align*} {\rm I\kern-.3em E}[Y^2|x] & = \int_{-\infty}^\infty y^2 f(y|x) \, dy = \int_{x}^\infty y^2 e^{-(y-x)} \, dy \\ & = (y^2+2y+2) e^{-(y-x)} \bigg|_{x}^\infty = x^2 + 2x + 2 \end{align*} where we integrated by parts
The conditional variance of Y given X=x is {\rm Var}[Y|x] = {\rm I\kern-.3em E}[Y^2|x] - {\rm I\kern-.3em E}[Y|x]^2 = x^2 + 2x + 2 - (x+1)^2 = 1
This can also be interpreted as the random variable {\rm Var}[Y|X] = 1
Note: The above formula contains abuse of notation – {\rm I\kern-.3em E} has 3 meanings
Suppose (X,Y) is continuous
Recall that {\rm I\kern-.3em E}[X|Y] denotes the random variable g(Y) with g(y):= {\rm I\kern-.3em E}[X|y] := \int_{\mathbb{R}} xf(x|y) \, dx
Also recall that by definition f_{X,Y}(x,y)= f(x|y)f_Y(y)
Therefore \begin{align*} {\rm I\kern-.3em E}[{\rm I\kern-.3em E}[X|Y]] & = {\rm I\kern-.3em E}[g(Y)] = \int_{\mathbb{R}} g(y) f_Y(y) \, dy \\ & = \int_{\mathbb{R}} \left( \int_{\mathbb{R}} xf(x|y) \, dx \right) f_Y(y)\, dy = \int_{\mathbb{R}^2} x f(x|y) f_Y(y) \, dx dy \\ & = \int_{\mathbb{R}^2} x f_{X,Y}(x,y) \, dx dy = \int_{\mathbb{R}} x \left( \int_{\mathbb{R}} f_{X,Y}(x,y)\, dy \right) \, dx \\ & = \int_{\mathbb{R}} x f_{X}(x) \, dx = {\rm I\kern-.3em E}[X] \end{align*}
If (X,Y) is discrete the thesis follows by replacing intergrals with series
Consider again the continuous random vector (X,Y) with joint pdf f_{X,Y}(x,y) := e^{-y} \,\, \text{ if } \,\, 0 < x < y \,, \quad f_{X,Y}(x,y) :=0 \,\, \text{ otherwise}
We have proven that {\rm I\kern-.3em E}[Y|X] = X + 1
We have also shown that f_X is exponential f_{X}(x) = \begin{cases} e^{-x} & \text{ if } x > 0 \\ 0 & \text{ if } x \leq 0 \end{cases}
From the knowledge of f_X we can compute {\rm I\kern-.3em E}[X] {\rm I\kern-.3em E}[X] = \int_0^\infty x e^{-x} \, dx = -(x+1)e^{-x} \bigg|_{x=0}^{x=\infty} = 1
Using the Theorem we can compute {\rm I\kern-.3em E}[Y] without computing f_Y: \begin{align*} {\rm I\kern-.3em E}[Y] & = {\rm I\kern-.3em E}[ {\rm I\kern-.3em E}[Y|X] ] \\ & = {\rm I\kern-.3em E}[X + 1] \\ & = {\rm I\kern-.3em E}[X] + 1 \\ & = 1 + 1 = 2 \end{align*}
In previous example: the conditional distribution of Y given X=x was f(y|x) = \begin{cases} e^{-(y-x)} & \text{ if } y > x \\ 0 & \text{ if } y \leq x \end{cases}
In particular f(y|x) depends on x
This means that knowledge of X gives information on Y
When X does not give any information on Y we say that X and Y are independent
If X and Y are independent then X gives no information on Y (and vice-versa):
Conditional distribution: Y|X is same as Y f(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)} = \frac{f_X(x)f_Y(y)}{f_X(x)} = f_Y(y)
Conditional probabilities: From the above we also obtain \begin{align*} P(Y \in A | x) & = \sum_{y \in A} f(y|x) = \sum_{y \in A} f_Y(y) = P(Y \in A) & \, \text{ discrete rv} \\ P(Y \in A | x) & = \int_{y \in A} f(y|x) \, dy = \int_{y \in A} f_Y(y) \, dy = P(Y \in A) & \, \text{ continuous rv} \end{align*}
(X,Y) random vector with joint pdf or pmf f_{X,Y}. They are equivalent:
Suppose X and Y are independent random variables. Then
For any A,B \subset \mathbb{R} we have P(X \in A, Y \in B) = P(X \in A) P(Y \in B)
Suppose g(x) is a function of (only) x, h(y) is a function of (only) y. Then {\rm I\kern-.3em E}[g(X)h(Y)] = {\rm I\kern-.3em E}[g(X)]{\rm I\kern-.3em E}[h(Y)]
Define the function p(x,y):=g(x)h(y). Then \begin{align*} {\rm I\kern-.3em E}[g(X)h(Y)] & = {\rm I\kern-.3em E}(p(X,Y)) = \int_{\mathbb{R}^2} p(x,y) f_{X,Y}(x,y) \, dxdy \\ & = \int_{\mathbb{R}^2} g(x)h(y) f_X(x) f_Y(y) \, dxdy \\ & = \left( \int_{-\infty}^\infty g(x) f_X(x) \, dx \right) \left( \int_{-\infty}^\infty h(y) f_Y(y) \, dy \right) \\ & = {\rm I\kern-.3em E}[g(X)] {\rm I\kern-.3em E}[h(Y)] \end{align*}
Proof in the discrete case is the same: replace intergrals with series
Define the product set A \times B :=\{ (x,y) \in \mathbb{R}^2 \colon x \in A , y \in B\}
Therefore we get \begin{align*} P(X \in A , Y \in B) & = \int_{A \times B} f_{X,Y}(x,y) \, dxdy \\ & = \int_{A \times B} f_X(x) f_Y(y) \, dxdy \\ & = \left(\int_{A} f_X(x) \, dx \right) \left(\int_{B} f_Y(y) \, dy \right) \\ & = P(X \in A) P(Y \in B) \end{align*}
Proof: Follows by previous Theorem \begin{align*} M_{X + Y} (t) & = {\rm I\kern-.3em E}[e^{t(X+Y)}] = {\rm I\kern-.3em E}[e^{tX}e^{tY}] \\ & = {\rm I\kern-.3em E}[e^{tX}] {\rm I\kern-.3em E}[e^{tY}] \\ & = M_X(t) M_Y(t) \end{align*}
Suppose X \sim N (\mu_1, \sigma_1^2) and Y \sim N (\mu_2, \sigma_2^2) are independent normal random variables
We have seen in Slide 119 in Lecture 1 that for normal distributions M_X(t) = \exp \left( \mu_1 t + \frac{t^2 \sigma_1^2}{2} \right) \,, \qquad M_Y(t) = \exp \left( \mu_2 t + \frac{t^2 \sigma_2^2}{2} \right)
Since X and Y are independent, from previous Theorem we get \begin{align*} M_{X+Y}(t) & = M_{X}(t) M_{Y}(t) = \exp \left( \mu_1 t + \frac{t^2 \sigma_1^2}{2} \right) \exp \left( \mu_2 t + \frac{t^2 \sigma_2^2}{2} \right) \\ & = \exp \left( (\mu_1 + \mu_2) t + \frac{t^2 (\sigma_1^2 + \sigma_2^2)}{2} \right) \end{align*}
Therefore Z := X + Y has moment generating function M_{Z}(t) = M_{X+Y}(t) = \exp \left( (\mu_1 + \mu_2) t + \frac{t^2 (\sigma_1^2 + \sigma_2^2)}{2} \right)
The above is the mgf of a normal distribution with \text{mean }\quad \mu_1 + \mu_2 \quad \text{ and variance} \quad \sigma_1^2 + \sigma_2^2
By the Theorem in Slide 132 of Lecture 1 we have Z \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)
Sum of independent normals is normal
Given two random variables X and Y we said that
X and Y are independent if f_{X,Y}(x,y) = f_X(x) g_Y(y)
In this case there is no relationship between X and Y
This is reflected in the conditional distributions: X|Y \sim X \qquad \qquad Y|X \sim Y
If X and Y are not independent then there is a relationship between them
Answer: By introducing the notions of
Notation: Given two rv X and Y we denote \begin{align*} & \mu_X := {\rm I\kern-.3em E}[X] \qquad & \mu_Y & := {\rm I\kern-.3em E}[Y] \\ & \sigma^2_X := {\rm Var}[X] \qquad & \sigma^2_Y & := {\rm Var}[Y] \end{align*}
The sign of {\rm Cov}(X,Y) gives information about the relationship between X and Y:
The sign of {\rm Cov}(X,Y) gives information about the relationship between X and Y
X small: \, X<\mu_X | X large: \, X>\mu_X | |
---|---|---|
Y small: \, Y<\mu_Y | (X-\mu_X)(Y-\mu_Y)>0 | (X-\mu_X)(Y-\mu_Y)<0 |
Y large: \, Y>\mu_Y | (X-\mu_X)(Y-\mu_Y)<0 | (X-\mu_X)(Y-\mu_Y)>0 |
X small: \, X<\mu_X | X large: \, X>\mu_X | |
---|---|---|
Y small: \, Y<\mu_Y | {\rm Cov}(X,Y)>0 | {\rm Cov}(X,Y)<0 |
Y large: \, Y>\mu_Y | {\rm Cov}(X,Y)<0 | {\rm Cov}(X,Y)>0 |
Using linearity of {\rm I\kern-.3em E} and the fact that {\rm I\kern-.3em E}[c]=c for c \in \mathbb{R}: \begin{align*} {\rm Cov}(X,Y) : & = {\rm I\kern-.3em E}[ \,\, (X - {\rm I\kern-.3em E}[X]) (Y - {\rm I\kern-.3em E}[Y]) \,\, ] \\ & = {\rm I\kern-.3em E}\left[ \,\, XY - X {\rm I\kern-.3em E}[Y] - Y {\rm I\kern-.3em E}[X] + {\rm I\kern-.3em E}[X]{\rm I\kern-.3em E}[Y] \,\, \right] \\ & = {\rm I\kern-.3em E}[XY] - {\rm I\kern-.3em E}[ X {\rm I\kern-.3em E}[Y] ] - {\rm I\kern-.3em E}[ Y {\rm I\kern-.3em E}[X] ] + {\rm I\kern-.3em E}[{\rm I\kern-.3em E}[X] {\rm I\kern-.3em E}[Y]] \\ & = {\rm I\kern-.3em E}[XY] - {\rm I\kern-.3em E}[X] {\rm I\kern-.3em E}[Y] - {\rm I\kern-.3em E}[Y] {\rm I\kern-.3em E}[X] + {\rm I\kern-.3em E}[X] {\rm I\kern-.3em E}[Y] \\ & = {\rm I\kern-.3em E}[XY] - {\rm I\kern-.3em E}[X] {\rm I\kern-.3em E}[Y] \end{align*}
Remark:
{\rm Cov}(X,Y) encodes only qualitative information about the relationship between X and Y
To obtain quantitative information we introduce the correlation
Correlation detects linear relationships between X and Y
For any random variables X and Y we have
Proof: Omitted, see page 172 of [1]
Proof:
Proof: Exercise
Everything we defined for bivariate vectors extends to multivariate vectors
The random vector \mathbf{X}\colon \Omega \to \mathbb{R}^n is:
Note: For all A \subset \mathbb{R}^n it holds P(\mathbf{X}\in A) = \sum_{\mathbf{x}\in A} f_{\mathbf{X}}(\mathbf{x})
Note: \int_A denotes an n-fold intergral over all points \mathbf{x}\in A
Marginal pmf or pdf of any subset of the coordinates (X_1,\ldots,X_n) can be computed by integrating or summing the remaining coordinates
To ease notations, we only define maginals wrt the first k coordinates
We now define conditional distributions given the first k coordinates
Similarly, we can define the conditional distribution given the i-th coordinate
\mathbf{X}=(X_1,\ldots,X_n) random vector with joint pmf or pdf f_{\mathbf{X}}. They are equivalent:
Proof: Omitted. See [1] page 184
Example: X_1,\ldots,X_n \, independent \,\, \implies \,\, X_1^2, \ldots, X_n^2 \, independent
We have seen in Slide 119 in Lecture 1 that if X_i \sim N(\mu_i,\sigma_i^2) then M_{X_i}(t) = \exp \left( \mu_i t + \frac{t^2 \sigma_i^2}{2} \right)
Since X_1,\ldots,X_n are mutually independent, from previous Theorem we get \begin{align*} M_{Z}(t) & = \prod_{i=1}^n M_{X_i}(t) = \prod_{i=1}^n \exp \left( \mu_i t + \frac{t^2 \sigma_i^2}{2} \right) \\ & = \exp \left( (\mu_1 + \ldots + \mu_n) t + \frac{t^2 (\sigma_1^2 + \ldots +\sigma_n^2)}{2} \right) \\ & = \exp \left( \mu t + \frac{t^2 \sigma^2 }{2} \right) \end{align*}
Therefore Z has moment generating function M_{Z}(t) = \exp \left( \mu t + \frac{t^2 \sigma^2 }{2} \right)
The above is the mgf of a normal distribution with \text{mean }\quad \mu \quad \text{ and variance} \quad \sigma^2
Since mgfs characterize distributions (see Theorem in Slide 132 of Lecture 1), we conclude Z \sim N(\mu, \sigma^2 )
We have seen in Slide 126 in Lecture 1 that if X_i \sim \Gamma(\alpha_i,\beta) then M_{X_i}(t) = \frac{\beta^{\alpha_i}}{(\beta-t)^{\alpha_i}}
Since X_1,\ldots,X_n are mutually independent we get \begin{align*} M_{Z}(t) & = \prod_{i=1}^n M_{X_i}(t) = \prod_{i=1}^n \frac{\beta^{\alpha_i}}{(\beta-t)^{\alpha_i}} \\ & = \frac{\beta^{(\alpha_1 + \ldots + \alpha_n)}}{(\beta-t)^{(\alpha_1 + \ldots + \alpha_n)}} \\ & = \frac{\beta^{\alpha}}{(\beta-t)^{\alpha}} \end{align*}
Therefore Z has moment generating function M_{Z}(t) = \frac{\beta^{\alpha}}{(\beta-t)^{\alpha}}
The above is the mgf of a Gamma distribution with \text{mean }\quad \alpha \quad \text{ and variance} \quad \beta
Since mgfs characterize distributions (see Theorem in Slide 132 of Lecture 1), we conclude Z \sim \Gamma(\alpha, \beta )