(Intercept) refers to the coefficient \hat \alpha \qquad \implies \qquad \hat \alpha = 37.917
stock.price refers to the coefficient \hat \beta \qquad \implies \qquad \hat \beta = - 6.169
Interesting parts of Output
Other quantities of interest are:
Coefficient R^2
\texttt{Multiple R-squared: 0.3953}
t-statistic forstock.price
\texttt{stock.price t-value: -4.502}
F-statistic
\texttt{F-statistic: 20.27}
Plotting the regression line
# Data stored in stock.price # and gold.price# Plot the dataplot(stock.price, gold.price, xlab ="Stock Price", ylab="Gold Price",pch =16)# Model stored in fit.model# Plot the regression lineabline(fit.model, col ="red", lwd =3)
Conclusion
We fit a simple linear model to Stock Price Vs Gold Price
We obtained the regression line
{\rm I\kern-.3em E}[Y | x] = \hat \alpha + \hat \beta x = 37.917 - 6.169 \times x
The coefficient of determination is
R^2 = 0.395325 \geq 0.3
Hence the linear model explains the data to a reasonable extent:
Stock Price affects Gold Price
Since R^2 is not too large, also other factors may affect Gold Price
t-test and F-test for regression
From lm we also obtained
t-statistic forstock.price
\texttt{stock.price t-value: -4.502}
F-statistic
\texttt{F-statistic: 20.27}
t-statistic and F-statistic for regression are mathematically DIFFICULT topic
In the next two parts we explain what they mean
We however omit mathematical details
If interested check out Section 11.3 of [1] and Chapter 11 of [2]
Part 3: t-test for simple regression
t-test for simple regression
Consider the simple linear regression model
Y_i = \alpha + \beta X_i + \varepsilon_i
We have that
X \,\, \text{ affects } \,\, Y \qquad \iff \qquad
\beta \neq 0
\beta is a random quantity which depends on the sample
Proof: Quite difficult. If interested see Theorem 11.3.3 in [1]
Construction of t-statistic for \hat \beta
From the previous Theorem, we know that
\hat \beta \sim N \left(\beta , \frac{ \sigma^2 }{ S_{xx} } \right)
In particular \hat \beta is an unbiased estimator for \beta
{\rm I\kern-.3em E}[ \hat \beta ] = \beta
This means \hat \beta is the Estimate for the unknown parameter \beta
The t-statistic is therefore
t = \frac{\text{Estimate } - \text{ Hypothesised Value}}{\mathop{\mathrm{e.s.e.}}}
= \frac{ \hat \beta - \beta }{ \mathop{\mathrm{e.s.e.}}}
Estimated Standard Error for \hat \beta
From the Theorem in Slide 30, we know that
{\rm Var}[\hat \beta] = \frac{ \sigma^2 }{ S_{xx}} \quad \implies \quad {\rm SD}[\hat \beta] = \frac{ \sigma }{ \sqrt{S_{xx}} }
The standard deviation {\rm SD} cannot be used as error, since \sigma^2 is unknown
(Recall that \sigma^2 is the unknown variance of the error \varepsilon_i)
We however have an estimate for \sigma^2
\hat \sigma^2 = \frac{1}{n} \mathop{\mathrm{RSS}}= \frac1n \sum_{i=1}^n (y_i - \hat y_i)^2
\hat \sigma^2 was obtained from maximization of the likelihood function (Lecture 8)
It can be shown that
{\rm I\kern-.3em E}[ \hat\sigma^2 ] = \frac{n-2}{n} \, \sigma^2
(for a proof, see Section 11.3.4 in [1])
Therefore \hat\sigma^2 is not unbiased estimator of \sigma^2
To obtain an unbiased estimator, we rescale \hat\sigma^2 and introduce S^2
S^2 := \frac{n}{n-2} \, \hat\sigma^2 = \frac{\mathop{\mathrm{RSS}}}{n-2}
This way, S^2 is unbiased estimator for \sigma^2
{\rm I\kern-.3em E}[S^2] = \frac{n}{n-2} \, {\rm I\kern-.3em E}[\hat\sigma^2] = \frac{n}{n-2} \,\cdot \, \frac{n-2}{n} \, \cdot \, \sigma^2 = \sigma^2
Recall that the standard deviation of \hat \beta is
Moreover, it can be shown that
V = \frac{(n-2) S^2}{\sigma^2} \, \sim \, \chi_{n-2}^2
It can also be shown that U and V are independent
In summary, we have
t = \frac{ \hat \beta - \beta }{ S / \sqrt{S_{xx}} }
= \frac{ U }{ \sqrt{ V/(n-2) } }
U and V are independent, with
U \sim N(0,1) \,, \qquad \quad V \sim \chi_{n-2}^2
From the Theorem on t-distribution in Lecture 2, we conclude
t \sim t_{n-2}
The t-test for \beta
Assumption: Given data points (x_1, y_1), \ldots, (x_n,y_n), consider the simple linear regression model
Y_i = \alpha + \beta x_i + \varepsilon_i \,, \qquad \varepsilon_i \,\, \text{ iid } \,\, N(0,\sigma^2)
Goal: Statistical inference on the slope \beta
Hypotheses: If b is guess for \beta, the two-sided hypothesis is
H_0 \colon \beta = b \,, \quad \qquad
H_1 \colon \beta \neq b
The one-sided alternative hypotheses are
H_1 \colon \beta < b \quad \text{ or } \quad
H_1 \colon \beta > b
Procedure: 3 Steps
Calculation: Compute the MLE \hat{\alpha}, \hat{\beta} and the predictions \hat{y}_i
\hat \beta = \frac{ S_{xy} }{ S_{xx} } \,, \qquad
\hat \alpha = \overline{y} - \hat \beta \overline{x} \,, \qquad \hat{y}_i = \hat{\alpha} + \hat{\beta} x_i
Compute the Residual Sum of Squares and the estimator S^2
\mathop{\mathrm{RSS}}= \sum_{i=1}^n (y_i - \hat{y}_i)^2 \,, \qquad S^2 := \frac{\mathop{\mathrm{RSS}}}{n-2}
Finally, compute the \mathop{\mathrm{e.s.e.}} and t-statistic
\mathop{\mathrm{e.s.e.}}= \frac{S}{\sqrt{S_{xx}} } \,, \qquad t = \frac{\hat \beta - b }{ \mathop{\mathrm{e.s.e.}}} \ \sim \ t_{n-2}
Interpretation:\, p is very small (hence the \, \texttt{***} \, rating)
Therefore, we reject the null hypothesis H_0, and the real parameter is \beta \neq 0
Since \beta \neq 0, we have that Stock Prices affect Gold Prices
The best estimate for \beta is the MLE \hat \beta = -6.169
\hat \beta < 0 and statistically significant: \,\, As Stock Prices increase Gold Prices decrease
Warning
The t-statistic and p-value in the summary refer to the two-sided test
H_0 \colon \beta = 0 \,, \qquad
H_1 \colon \beta \neq 0
In such case the t-statistic and p-value given in summary are
t = \frac{\hat \beta - 0}{\mathop{\mathrm{e.s.e.}}} \,, \qquad p = 2P(t_{n- 2} > |t|)
If instead you are required to test
H_0 \colon \beta = b \,, \qquad
H_1 \colon \beta \neq b \,, \quad
\beta < b\,, \quad \text{ or } \,\,
\beta > b
The t-statistic has to be computed by hand
t = \frac{\hat \beta - b}{\mathop{\mathrm{e.s.e.}}}
\, \mathop{\mathrm{e.s.e.}} is in the 2nd row under \,\, \texttt{Std. Error}
The p-value must be computed by hand, according to the table
Alternative
p-value
R command
\beta \neq b
2P(t_{n-2} > |t|)
2 - 2 * pt(abs(t), df = n - 2)
\beta < b
P(t_{n-2} < t)
pt(t, df = n - 2)
\beta > b
P(t_{n-2} > t)
1 - pt(t, df = n - 2)
Part 4: t-test for general regression
Recall: The t-test for simple regression
In the previous Part, we have considered the simple linear regression model
Our best guess for \beta_j is the ML Estimator
\hat{\beta}_j = (\hat \beta)_j \,, \qquad \hat \beta = (Z^T Z)^{-1} Z^T y
(\beta_j is the j-th component of the vector \beta)
To test above hypotheses, we therefore need to
Know the distribution of \hat{\beta}_j
Construct t-statistic involving \hat{\beta}_j
Distribution of \hat \beta_j
Theorem
Consider the general linear regression model
Y_i = \beta_1 z_{i1} + \ldots + \beta_{ip} z_{ip} + \varepsilon_i \,, \qquad
\varepsilon_i \, \text{ iid } \, N(0, \sigma^2)
The MLE \hat{\beta}_j is normally distributed
\hat \beta_j \sim N \left( \beta_j , \xi_{jj} \sigma^2 \right) \,,
where the numbers \xi_{jj} are the diagonal entries of the p \times p matrix
(Z^T Z)^{-1} =
\left(
\begin{array}{ccc}
\xi_{11} & \ldots & \xi_{1p} \\
\ldots & \ldots & \ldots \\
\xi_{p1} & \ldots & \xi_{pp} \\
\end{array}
\right)
Proof: Quite difficult. If interested, see see Section 11.5 in [2]
Construction of the t-statistic for \hat{\beta}_j
From the previous Theorem, we know that
\hat \beta_j \sim N \left( \beta_j , \xi_{jj} \sigma^2 \right)
In particular, \hat{\beta}_j is an unbiased estimator for \beta_j
{\rm I\kern-.3em E}[ \hat{\beta}_j ] = \beta_j
This means \hat{\beta}_j is the Estimate for the unknown parameter \beta
The t-statistic is therefore
t = \frac{\text{Estimate } - \text{ Hypothesised Value}}{\mathop{\mathrm{e.s.e.}}}
= \frac{ \hat{\beta}_j - \beta_j }{ \mathop{\mathrm{e.s.e.}}}
Estimated Standard Error for \hat{\beta}_j
From the Theorem, we know that
{\rm Var}[\hat{\beta}_j] = \xi_{jj} \, \sigma^2 \qquad \implies \qquad
{\rm SD}[\hat \beta_j] = \xi_{jj}^{1/2} \, \sigma
The standard deviation {\rm SD} cannot be used as error, since \sigma^2 is unknown
We however have an estimate for \sigma^2
\hat \sigma^2 = \frac{1}{n} \mathop{\mathrm{RSS}}= \frac1n \sum_{i=1}^n (y_i - \hat y_i)^2
\hat \sigma^2 was obtained from maximization of the likelihood function (Lecture 8)
It can be shown that (see Section 11.5 in [2])
{\rm I\kern-.3em E}[ \hat\sigma^2 ] = \frac{n-p}{n} \, \sigma^2
Therefore \hat\sigma^2 is not unbiased estimator of \sigma^2
To obtain unbiased estimator, we rescale \hat\sigma^2 and introduce S^2
S^2 := \frac{n}{n-p} \, \hat\sigma^2 = \frac{\mathop{\mathrm{RSS}}}{n-p}
Assumption: Given data points (z_{1i}, \ldots, z_{pi}, y_i), consider the general model
Y_i = \beta_1 z_{i1} + \ldots + \beta_p z_{ip} + \varepsilon_i \,, \qquad \varepsilon_i \, \text{ iid } \, N(0,\sigma^2)
Goal: Statistical inference on the coefficients \beta_j
Hypotheses: If b_j is guess for \beta_j, the two-sided hypotheses is
H_0 \colon \beta_j = b_j \,, \quad \qquad
H_1 \colon \beta_j \neq b_j
The one-sided alternative hypotheses are
H_1 \colon \beta_j < b_j \quad \text{ or } \quad
H_1 \colon \beta_j > b_j
Procedure: 3 Steps
Calculation: Write down the design matrix Z and compute (Z^TZ)^{-1}
Z :=
\left(
\begin{array}{ccc}
z_{11} & \ldots & z_{1p} \\
\ldots & \ldots & \ldots \\
z_{n1} & \ldots & z_{np} \\
\end{array}
\right) \,, \qquad
(Z^T Z)^{-1} =
\left(
\begin{array}{ccc}
\xi_{11} & \ldots & \xi_{1p} \\
\ldots & \ldots & \ldots \\
\xi_{p1} & \ldots & \xi_{pp} \\
\end{array}
\right)
Compute the MLE \hat{\beta}, predictions \hat{y}, RSS and S^2
\hat \beta = (Z^TZ)^{-1} Z^T y \,, \qquad
\hat{y} = Z \hat{\beta} \,, \qquad
\mathop{\mathrm{RSS}}= \sum_{i=1}^n (y_i - \hat{y}_i)^2 \,, \qquad S^2 := \frac{\mathop{\mathrm{RSS}}}{n-2}
Finally, compute the \mathop{\mathrm{e.s.e.}} and the t-statistic
\mathop{\mathrm{e.s.e.}}= \xi_{jj}^{1/2} \, S\,, \qquad t = \frac{\hat \beta_j - b_j }{ \mathop{\mathrm{e.s.e.}}} \ \sim \ t_{n-p}
Interpretation: Reject H_0 when either
p < 0.05 \qquad \text{ or } \qquad t \in \,\,\text{Rejection Region}
\qquad \qquad \qquad \qquad
(T \, \sim \, t_{n-p})
Alternative
Rejection Region
t^*
p-value
\beta_j \neq b_j
|t| > t^*
t_{n-p}(0.025)
2P(T > |t|)
\beta_j < b_j
t < - t^*
t_{n-p}(0.05)
P(T < t)
\beta_j > b_j
t > t^*
t_{n-p}(0.05)
P(T > t)
The t-test for \beta_j in R
First, fit general regression model with lm
Then read the summary
Assume the hypotheses to test are
H_0 \colon \beta_j = 0 \,, \qquad
H_1 \colon \beta_j \neq 0
In such case, the t-statistic and p-value are explicitly given in the summary
t = \frac{\hat \beta_j - 0}{\mathop{\mathrm{e.s.e.}}} \,, \qquad p = 2P(t_{n- p} > |t|)
Read the t-statistic in j-th variable row under \,\, \texttt{t value}
Read the p-value in j-th variable row under \,\, \texttt{Pr(>|t|)}
If instead you are required to test
H_0 \colon \beta_j = b_j \,, \qquad
H_1 \colon \beta_j \neq b_j \,, \quad
\beta_j < b_j \,, \quad \text{ or } \,\,
\beta_j > b_j
The t-statistic has to be computed by hand
t = \frac{\hat{\beta}_j - b_j}{\mathop{\mathrm{e.s.e.}}}
\,\, \hat \beta_j is in j-th variable row under \,\, \texttt{Estimate}
\,\, \mathop{\mathrm{e.s.e.}} for \hat \beta_j is in j-th variable row under \,\, \texttt{Std. Error}
The p-value must be computed by hand, according to the table
Alternative
p-value
R command
\beta_j \neq b_j
2P(t_{n-p} > |t|)
2 - 2 * pt(abs(t), df = n - p)
\beta_j < b_j
P(t_{n-p} < t)
pt(t, df = n - p)
\beta_j > b_j
P(t_{n-p} > t)
1 - pt(t, df = n - p)
Worked Example: Stock and Gold prices
Recall that
X = Stock Price
Y = Gold Price
Goal: Test if an intercept is present, at level 0.05
Procedure: Consider the linear model
Y_i = \alpha + \beta x_i + \varepsilon_i \,, \quad \varepsilon_i \,\, \text{ iid } \,\, N(0,\sigma^2)
Test the hypotheses with two-sided alternative \begin{align*}
H_0 & \colon \alpha = 0 \\
H_1 & \colon \alpha \neq 0
\end{align*}
Testing for \alpha= 0
Recall that Stock Prices and Gold Prices are stored in R vectors
stock.price
gold.price
Fit the simple linear model with the following commands
Interpretation:\, p is very small (hence the \, \texttt{***} \, rating)
Therefore, we reject the null hypothesis H_0, and the real parameter is \alpha \neq 0
Since \alpha \neq 0, we deduce that the model has to include an intercept
The best estimate for \alpha is the MLE \hat \alpha = 37.917
Theoretical Example
Deriving \mathop{\mathrm{e.s.e.}} for simple linear regression
Consider the simple regression model
Y_i = \alpha + \beta x_i + \varepsilon_i
We have seen in the previous part that the \mathop{\mathrm{e.s.e.}} for \hat{\beta} is
\mathop{\mathrm{e.s.e.}}= \frac{S}{\sqrt{S_{xx}}} \,, \qquad S = \sqrt{ \frac{\mathop{\mathrm{RSS}}}{n-2} }
However, we have not computed the \mathop{\mathrm{e.s.e.}} for \hat{\alpha}
Goal: Compute \mathop{\mathrm{e.s.e.}} for \hat{\alpha} using the theory developed for general regression
Theory developed so far:
Consider the general linear model
Y_i = \beta_1 z_{i1} + \ldots +\beta_p z_{ip} + \varepsilon_i \,, \qquad \varepsilon_i \,\, \text{ iid } \,\, N(0, \sigma^2)
We have shown that the estimated standard error for \hat{\beta}_j is
\mathop{\mathrm{e.s.e.}}= \xi_{jj}^{1/2} \, S \,, \qquad
S = \sqrt{ \frac{\mathop{\mathrm{RSS}}}{n-2} }
where \xi_{jj} are the diagonal entries of the matrix
(Z^T Z)^{-1} =
\left(
\begin{array}{ccc}
\xi_{11} & \ldots & \xi_{1p} \\
\ldots & \ldots & \ldots \\
\xi_{p1} & \ldots & \xi_{pp} \\
\end{array}
\right)
Choosing the Full Model 2 is equivalent to rejecting H_0
\begin{align*}
H_0 & \colon \, \beta_2 = \beta_3 = \ldots = \beta_p = 0 \\
H_1 & \colon \text{ At least one of the } \beta_i \text{ is non-zero}
\end{align*}
The hypothesis for deciding between the two model is \begin{align*}
H_0 & \colon \, \beta_2 = \beta_3 = \ldots = \beta_p = 0 \\
H_1 & \colon \text{ At least one of the } \beta_i \text{ is non-zero}
\end{align*}
Suppose the null hypothesis H_0 holds
In this case, the Reduced Model and Full models are the same
Therefore, predictions of Full and Reduced Model will be similar
Hence, the \mathop{\mathrm{RSS}} for the 2 models are similar
The hypothesis for deciding between the two model is \begin{align*}
H_0 & \colon \, \beta_2 = \beta_3 = \ldots = \beta_p = 0 \\
H_1 & \colon \text{ At least one of the } \beta_i \text{ is non-zero}
\end{align*}
Suppose instead that the alternative hypothesis H_1 holds
In this case the predictions of Full and Reduced Model will be different
As already noted, in general it holds that
\mathop{\mathrm{RSS}}(1) \geq \mathop{\mathrm{RSS}}(p)
From H_1, we know that some of the extra parameters \beta_2 , \ldots, \beta_p are non zero
Thus, Full Model will give better predictions \implies \mathop{\mathrm{RSS}}(p) is much smaller
\mathop{\mathrm{RSS}}(1) \gg \mathop{\mathrm{RSS}}(p)
Recall: We want to test the Overall Significance of the parameters \beta_2,\ldots, \beta_p
This means deciding which model gives better predictions between \begin{align*}
\textbf{Model 1:} & \quad Y_i = \beta_1 + \varepsilon_i & \text{Reduced Model}\\[15pt]
\textbf{Model 2:} & \quad Y_i = \beta_1 + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + \varepsilon_i & \text{Full Model}
\end{align*}
To make a decision, we have formulated the hypothesis \begin{align*}
H_0 & \colon \, \beta_2 = \beta_3 = \ldots = \beta_p = 0 \\
H_1 & \colon \text{ At least one of the } \beta_i \text{ is non-zero}
\end{align*}
H_0 favors Model 1
H_1 favors Model 2
We use the F-statistic to decide between the two models
F = \frac{\mathop{\mathrm{RSS}}(1) - \mathop{\mathrm{RSS}}(p)}{ p - 1 } \bigg/
\frac{\mathop{\mathrm{RSS}}(p)}{n - p}
Goal: Testing the Overall Significance of the parameters \beta_2,\ldots, \beta_p
(i.e. decide which model gives better predictions)
Hypothesis: Choosing the Full Model 2, is equivalent to rejecting H_0
\begin{align*}
H_0 & \colon \, \beta_2 = \beta_3 = \ldots = \beta_p = 0 \\
H_1 & \colon \text{ At least one of the } \beta_i \text{ is non-zero}
\end{align*}
Procedure: 3 Steps
Calculation: Write design matrix Z, and compute MLE \hat{\beta} and predictions \hat{y}_i
Z =
\left(
\begin{array}{cccc}
1 & x_{12} & \ldots & x_{1p} \\
\ldots & \ldots & \ldots & \ldots \\
1 & x_{n2} & \ldots & x_{np} \\
\end{array}
\right) \,, \qquad \hat{\beta} = (Z^TZ)^{-1} Z^T y \,, \qquad \hat{y} = Z \hat{\beta}
Compute the \mathop{\mathrm{RSS}}, \mathop{\mathrm{TSS}} and R^2 coefficient for the Full Model
\mathop{\mathrm{RSS}}(p) = \sum_{i=1}^n (y_i - \hat{y}_i)^2 \,, \qquad
\mathop{\mathrm{TSS}}= \sum_{i=1}^n (y_i - \overline{y})^2\,, \qquad
R^2 = 1 - \frac{\mathop{\mathrm{RSS}}(p)}{\mathop{\mathrm{TSS}}}
Finally, compute the F-statistic for Overall Significance
F = \frac{R^2}{1 - R^2} \, \cdot \, \frac{n - p}{p - 1} \ \sim \ F_{p-1, n - p}
Interpretation: Reject H_0 when either
p < 0.05 \qquad \text{ or } \qquad F \in \,\,\text{Rejection Region}
Alternative
Rejection Region
F^*
p-value
At least one of the \beta_i is non-zero
F > F^*
F_{p-1,n-p}(0.05)
P(F_{p-1,n-p} > F)
The F-test for Overall Significance in R
Fit the Full Model with lm
Read the Summary
F-statistic is listed in the summary
p-value is listed in the summary
# Fit the Full Modelfull.model <-lm(y ~ x2 + x3 + ... + xp)# Read the summarysummary(full.model)## F-statistic for simple regression {.smaller}::: Proposition1. The $F$-statistic for Overall Significance in simple regression is$$F = t^2 \,, \qquad \quad t = \frac{\hat \beta}{ S / \sqrt{S_{xx}}}$$where $t$ is the t-statistic for$\hat \beta$.2. In particular, the p-values for t-test and F-test coincide$$p =2P( t_{n-2} >|t| ) =P( |t_{n-2}|>|t| ) =P( F_{1,n-2} > F )$$:::**Proof:** Will be left as an exercise## Worked Example: Stock and Gold prices {.smaller}- Recall that *$X =$ Stock Price*$Y =$ Gold Price- Consider the simple linear regression model$$Y_i = \alpha + \beta x_i + \e_i $$- We want to test the **Overall Significance** of the parameter $\beta$## {.smaller}- This means deciding which of the two models below gives better predictions\begin{align*}\textbf{Model 1:} & \quad Y_i = \alpha + \e_i & \text{Reduced Model}\\[15pt]\textbf{Model 2:} & \quad Y_i = \alpha + \beta x_{i} + \e_i & \text{Full Model}\end{align*}-**Hypothesis:** Choosing the Full Model 2, is equivalent to rejecting $H_0$\begin{align*}H_0 & \colon \, \beta =0 \\H_1 & \colon \, \beta \neq 0\end{align*}- Let us perform the F-test in R## {.smaller}- Recall that Stock Prices and Gold Prices are stored in R vectors*``stock.price``*``gold.price``- Fit the simple linear model with the following commands$$\text{gold.price } = \alpha + \beta \, \times ( \text{ stock.price } ) + \text{ error}$$```r# Fit simple linear regression modelfit.model <- lm(gold.price ~ stock.price)# Print result to screensummary(fit.model)
Output: F-statistic and p-value
The computed F-statistic is F = 20.27
There are n = 33 data points, and p = 2 parameters in the Full Model
Therefore, the degrees of freedom are
{\rm df}_1 = p - 1 = 2 - 1 = 1 \,, \qquad
{\rm df}_2 = n - p = 33 - 2 = 31
These are listed in the output, after the F-statistic
In particular, we have that the F-statistic has distribution
F \ \sim \ F_{{\rm df}_1, {\rm df}_2} = F_{1,31}
Output: F-statistic and p-value
The computed p-value is
p = P( F_{1,31} > F ) = 8.904 \times 10^{-5}
Conclusion: Strong evidence (p=0.000) that the parameter \beta is significant
Going back to the model, this means that Stock Price affects Gold Price
F <-20.267# Input the precise value, rather than the rounded one given in Summaryn <-33p <-2p.value <-1-pf(F, df1 = p -1, df2 = n - p)cat("The p-value is", p.value)
The p-value is 8.904244e-05
Output: F-statistic and p-value
We can also find critical value F_{1,31} (0.05), by finding its closest values in Table 3
F_{1, 30} (0.05) = 4.17 \,, \qquad \quad F_{1, 40} (0.05) = 4.08
Polynomial model artificially increases R^2, but predictions are not better
Reason 2: Multicollinearity
An assumption of regression is independence of the variables X_i
In particular, this implies the X_i are uncorrelated
However, the Longley dataset contains several correlated variables
Example:GNP, Population and GNP Deflator are highly correlated
Growing economy (higher Population and GNP) increases inflation (GNP Deflator)
Multicollinearity leads to a strange phenomenon:
We have seen that GNP, Population and GNP Deflator do not affect Employed
However, common sense says they should
At the same time, high R^2 suggests a good model
What is going on?
Multicollinearity breaks the model: t-tests for individual coefficients are unreliable
References
[1]
Casella, George, Berger, Roger L., Statistical inference, second edition, Brooks/Cole, 2002.
[2]
DeGroot, Morris H., Schervish, Mark J., Probability and statistics, Fourth Edition, Addison-Wesley, 2012.
[3]
J.W. Longley, An appraisal of least-squares programs from the point of view of the user, Journal of the American Statistical Association. 62 (1967) 819–841.