# Probability woman exceeds 180cm in height
# P(X > 180) = 1 - P(X <= 180)
1 - pnorm(180, mean = 163, sd = 8)
[1] 0.01679331
Lecture 5
The-one sample variance test uses chi-squared distribution
Recall: Chi-squared distribution with p degrees of freedom is \chi_p^2 = Z_1^2 + \ldots + Z_p^2 where Z_1, \ldots, Z_p are iid N(0, 1)
Assumption: Suppose given sample X_1,\ldots, X_n iid from N(\mu,\sigma^2)
Goal: Estimate variance \sigma^2 of population
Test:
Suppose \sigma_0 is guess for \sigma
The one-sided hypothesis test for \sigma is H_0 \colon \sigma = \sigma_0 \qquad H_1 \colon \sigma > \sigma_0
Consider the sample variance S^2 = \frac{ \sum_{i=1}^n X_i^2 - n \overline{X}^2 }{n-1}
Since we believe H_0, the variance is \sigma = \sigma_0
S^2 cannot be too far from the true variance \sigma
Therefore we cannot have that S^2 \gg \sigma^2 = \sigma_0^2
If we observe S^2 \gg \sigma_0^2 then our guess \sigma_0 is probably wrong
Therefore we reject H_0 if S^2 \gg \sigma_0^2
The rejection condition S^2 \gg \sigma_0^2 is equivalent to \frac{(n-1)S^2}{\sigma_0^2} \gg 1 where n is the sample size
We define our test statistic as \chi^2 := \frac{(n-1)s^2}{\sigma_0^2}
The rejection condition is hence \chi^2 \gg 1
Recall that \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2
Assuming \sigma=\sigma_0, we therefore have \chi^2 = \frac{(n-1)S^2}{\sigma_0^2} = \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2
We reject H_0 if \chi^2 = \frac{(n-1)S^2}{\sigma_0^2} \gg 1
As \chi^2 \sim \chi_{n-1}^2, we decide to rejct H_0 if \chi^2 > \chi_{n-1}^2(0.05)
By definition the critical value \chi_{n-1}^2(0.05) is such that P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05
x^* := \chi_{n-1}^2(0.05) is point on x-axis such that P(\chi_{n-1}^2 > x^* ) = 0.05
In the picture we have n = 12 and \chi_{11}^2(0.05) = 19.68
Given the test statistic \chi^2 the p-value is defined as p := P( \chi_{n-1}^2 > \chi^2 )
Notice that p < 0.05 \qquad \iff \qquad \chi^2 > \chi_{n-1}^2(0.05)
This is because \chi^2 > \chi_{n-1}^2(0.05) iff p = P(\chi_{n-1}^2 > \chi^2) < P(\chi_{n-1}^2 > \chi_{n-1}^2(0.05) ) = 0.05
Suppose given
The one-sided hypothesis test is H_0 \colon \sigma = \sigma_0 \qquad H_1 \colon \sigma > \sigma_0 The variance ratio test consists of 3 steps
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |
Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |
Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
If X \sim N(\mu,\sigma^2) then P( \mu - 2 \sigma \leq X \leq \mu + 2\sigma ) \approx 0.95
Recall: \quad Difference = (CE - CS) \sim N(\mu,\sigma^2)
Hence if \sigma = 1 P( \mu - 2 \leq {\rm CE} - {\rm CS} \leq \mu + 2 ) \approx 0.95
Meaning of variance ratio test: \sigma=1 \quad \implies \quad \text{CS index is within } \pm{2\sigma} \text{ of CE index with probability } 0.95
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
Goal: Perform chi-squared variance ratio test in R
For this we need to compute p-value p = P(\chi_{n-1}^2 > \chi^2)
Thus, we need to compute probabilities for chi-squared distribution in R
R command | Meaning |
---|---|
pnorm(x, mean = mu, sd = sig) |
P(X \leq x) |
qnorm(p, mean = mu, sd = sig) |
P(X \leq q) = p |
dnorm(x, mean = mu, sd = sig) |
f(x) where f is distr. of X |
rnorm(n, mean = mu, sd = sig) |
n random samples from distr. of X |
Note: Syntax of commands
norm = normal \qquad p = probability \qquad q = quantile
d = distribution \qquad r = random
df = n
denotes n degrees of feedomR command | Meaning |
---|---|
pchisq(x, df = n) |
P(X \leq x) |
qchisq(p, df = n) |
P(X \leq q) = p |
dchisq(x, df = n) |
f(x) where f is distr. of X |
rchisq(m, df = n) |
m random samples from distr. of X |
The \chi^2 statistic for variance ratio test has distribution \chi_{n-1}^2
Question: Compute the p-value p := P(\chi_{n-1}^2 > \chi^2)
The \chi^2 statistic for variance ratio test has distribution \chi_{n-1}^2
Question: Compute the p-value p := P(\chi_{n-1}^2 > \chi^2)
Observe that p := P(\chi_{n-1}^2 > \chi^2) = 1 - P(\chi_{n-1}^2 \leq \chi^2)
The code is therefore
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |
Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
Back to Example: Monthly data on CE and CS
Question: Test the following hypothesis: H_0 \colon \sigma = 1 \qquad H_1 \colon \sigma > 1
Month | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cons. Expectation | 66 | 53 | 62 | 61 | 78 | 72 | 65 | 64 | 61 | 50 | 55 | 51 |
Cons. Spending | 72 | 55 | 69 | 65 | 82 | 77 | 72 | 78 | 77 | 75 | 77 | 77 |
Difference | -6 | -2 | -7 | -4 | -4 | -5 | -7 | -14 | -16 | -25 | -22 | -26 |
# Compute p-value
p_value <- 1 - pchisq(chi_squared, df = n - 1)
# Print p-value
cat("The p-value for one-sided variance test is", p_value)
The p-value for one-sided variance test is 0
Recall: Chi-squared distribution with p degrees of freedom is \chi_p^2 = Z_1^2 + \ldots + Z_p^2 where Z_1, \ldots, Z_n are iid N(0, 1)
Chi-squared distribution was used to:
Describe distribution of sample variance S^2: \frac{(n-1)S^2}{\sigma^2} \sim \chi_{n-1}^2
Define t-distribution: t_p \sim \frac{U}{\sqrt{V/p}} where U \sim N(0,1) and V \sim \chi_p^2
F-distribution is:
Notation: F-distribution with p and q degrees of freedom is denoted by F_{p,q}
The F-distribution is obtained as ratio of 2 independent chi-squared distributions
Similar to the proof (seen in Homework 3) that \frac{U}{\sqrt{V/p}} \sim t_p where U \sim N(0,1) and V \sim \chi_p^2 are independent
In our case we need to prove X := \frac{U/p}{V/q} \sim F_{p,q} where U \sim \chi_p^2 and V \sim \chi_q^2 are independent
U \sim \chi_{p}^2 and V \sim \chi_q^2 are independent. Therefore \begin{align*} f_{U,V} (u,v) & = f_U(u) f_V(v) \\ & = \frac{ 1 }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) 2^{(p+q)/2} } u^{\frac{p}{2} - 1} v^{\frac{q}{2} - 1} e^{-(u+v)/2} \end{align*}
Consider the change of variables x(u,v) := \frac{u/p}{v/q} \,, \quad y(u,v) := u + v
This way we have X = \frac{U/p}{V/q} \,, \qquad Y = U + V
We can compute f_X via f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy
Since f_{U,V} is known, then also f_{X,Y} is known
Moreover the integral f_{X}(x) = \int_{0}^\infty f_{X,Y}(x,y) \, dy can be explicitly computed, yielding the thesis f_{X}(x) = \frac{ \Gamma \left(\frac{p+q}{2} \right) }{ \Gamma \left( \frac{p}{2} \right) \Gamma \left( \frac{q}{2} \right) } \left( \frac{p}{q} \right)^{p/2} \, \frac{ x^{ (p/2) - 1 } }{ [ 1 + (p/q) x ]^{(p+q)/2} }
Suppose X \sim F_{p,q} with q>2. Then {\rm I\kern-.3em E}[X] = \frac{q}{q-2}
If X \sim F_{p,q} then 1/X \sim F_{q,p}
If X \sim t_q then X^2 \sim F_{1,q}
Requires a bit of work. It will be left as Homework assignment
By the Theorem in Slide 43 we have X \sim F_{p,q} \quad \implies \quad X = \frac{U/p}{V/q} with U \sim \chi_p^2 and V \sim \chi_q^2 independent. Therefore \frac{1}{X} = \frac{V/q}{U/p} \sim F_{q,p}