Write on paper and Scan in Black and White using a Scanner or Scanner App (Tiny Scanner, Scanner Pro, …)
Important: I will not mark
Assignments submitted outside of Canvas
Assignments submitted After the Deadline
References
Main textbooks
Slides are self-contained and based on the book
[1] Bingham, N. H. and Fry, J. M. Regression: Linear models in statistics. Springer, 2010
References
Main textbooks
.. and also on the book
[2] Fry, J. M. and Burke, M. Quantitative methods in finance using R. Open University Press, 2022
References
Secondary References
[3] Casella, G. and Berger R. L. Statistical inference. Second Edition, Brooks/Cole, 2002
[4] DeGroot M. H. and Schervish M. J. Probability and Statistics. Fourth Edition, Addison-Wesley, 2012
Probability & Statistics manual
Easier Probability & Statistics manual
References
Secondary References
[5] Dalgaard, P. Introductory statistics with R. Second Edition, Springer, 2008
[6] Davies, T. M. The book of R. No Starch Press, 2016
Concise Statistics with R
Comprehensive R manual
Part 2: Introduction
The nature of Statistics
Statistics is a mathematical subject
Maths skills will give you a head start
There are other occasions where common sense and detective skills can be more important
Provides an early example of mathematics working in concert with the available computation
The nature of Statistics
We will use a combination of hand calculation and software
Recognises that you are maths students
Software (R) is really useful, particularly for dissertations
Please bring your laptop into class
Download R onto your laptop
Overview of the module
Module has 11 Lectures, divided into two parts:
Part I - Mathematical statistics
Part II - Applied statistics
Overview of the module
Part I - Mathematical statistics
Introduction to statistics
Normal distribution family and one-sample hypothesis tests
Two-sample hypothesis tests
The chi-squared test
Non-parametric statistics
The maths of regression
Overview of the module
Part II - Applied statistics
An introduction to practical regression
The extra sum of squares principle and regression modelling assumptions
Violations of regression assumptions – Autocorrelation
Violation of regression assumptions – Multicollinearity
Dummy variable regression models
Simple but useful questions
Generic data:
What is a typical observation
What is the mean?
How spread out is the data?
What is the variance?
Regression:
What happens to Y as X increases?
increases?
decreases?
nothing?
Statistics answers these questions systematically
important for large datasets
The same mathematical machinery (normal family of distributions) can be applied to both questions
Analysing a general dataset
Two basic questions:
Location or mean
Spread or variance
Statistics enables to answer systematically:
One sample and two-sample t-test
Chi-squared test and F-test
Recall the following sketch
Curve represents data distribution
Motivating regression
Basic question in regression:
What happens to Y as X increases?
increases?
decreases?
nothing?
In this way regression can be seen as a more advanced version of high-school maths
Positive gradient
As X increases Y increases
Negative gradient
As X increases Y decreases
Zero gradient
Changes in X do not affect Y
Real data example
Real data is more imperfect
But the same basic idea applies
Example:
X= Stock price
Y= Gold price
Real data example
How does real data look like?
Dataset with 33 entries for Stock and Gold price pairs
Stock Price
Gold Price
1
3.230426
9.402434
2
2.992937
8.987918
3
2.194025
10.120387
4
2.602475
9.367327
5
2.963497
8.708742
6
4.224242
8.494215
7
7.433981
8.739684
8
5.060836
8.609681
9
3.903316
7.552746
10
4.260542
9.834538
11
3.469490
9.406448
Stock Price
Gold Price
12
2.948513
10.62240
13
3.354562
13.12062
14
3.930106
15.05097
15
3.693491
13.39932
16
3.076129
15.34968
17
2.934277
14.83910
18
2.658664
16.01850
19
2.450606
17.25952
20
2.489758
18.26270
21
2.591093
18.13104
22
2.520800
20.20052
Stock Price
Gold Price
23
2.471447
24.13767
24
2.062430
30.07695
25
1.805153
35.69485
26
1.673950
39.29658
27
1.620848
39.52317
28
1.547374
36.12564
29
1.721679
31.01106
30
1.974891
29.60810
31
2.168978
35.00593
32
2.277214
37.62929
33
2.993353
41.45828
Real data example
Visualizing the data
Plot Stock Price against Gold Price
Observation:
As Stock price decreases, Gold price increases
Why? This might be because:
Stock price decreases
People invest in secure assets (Gold)
Gold demand increases
Gold price increases
Don’t panic
Regression problems can look a lot harder than they really are
Basic question remains the same: what happens to Y as X increases?
Beware of jargon. Various authors distinguish between
Two variable regression model
Multiple regression model
Analysis of Variance
Analysis of Covariance
Despite these apparent differences:
Mathematical methodology stays (essentially) the same
regression-fitting commands in R stay (essentially) the same
Part 3: Probability revision
Probability revision
We start with reviewing some fundamental Probability notions
You saw these in the Y1 module Introduction to Probability & Statistics
We will adopt a slightly more mature mathematical approach
Remember: The mathematical description might look (a bit) different, but the concepts are the same
Topics reviewed:
Sample space
Events
Probability measure
Conditional probability
Events independence
Random Variable
Distribution
cdf
pmf
pdf
Sample space
Definition: Sample space
A set Ω of all possible outcomes of some experiment
Examples:
Coin toss: results in Heads =H and Tails =TΩ={H,T}
Student grade for Statistical Models: a number between 0 and 100Ω={x∈R:0≤x≤100}=[0,100]
Events
Definition: Event
A subset E of the sample space Ω (including ∅ and Ω itself)
Operations with events:
Union of two events A and BA∪B:={x∈Ω:x∈A or x∈B}
Intersection of two events A and BA∩B:={x∈Ω:x∈A and x∈B}
Events
More Operations with events:
Complement of an event AAc:={x∈Ω:x∈/A}
Infinite Union of a family of events Ai with i∈I
i∈I⋃Ai:={x∈Ω:x∈Ai for some i∈I}
Infinite Intersection of a family of events Ai with i∈I
i∈I⋂Ai:={x∈Ω:x∈Ai for all i∈I}
Events
Example: Consider sample space and events Ω:=(0,1],Ai=[i1,1],i∈N Then i∈I⋃Ai=(0,1],i∈I⋂Ai={1}
Events
Definition: Disjoint
Two events A and B are disjoint if A∩B=∅ Events A1,A2,… are pairwise disjoint if Ai∩Aj=∅,∀i=j
Events
Definition: Partition
The collection of events A1,A2,… is a partition of Ω if
A1,A2,… are pairwise disjoint
Ω=∪i=1∞Ai
What’s a Probability?
To each event E⊂Ω we would like to associate a number P(E)∈[0,1]
The number P(E) is called the probability of E
The number P(E) models the frequency of occurrence of E:
P(E) small means E has low chance of occurring
P(E) large means E has high chance of occurring
Technical issue:
One cannot associate a number P(E) for all events in Ω
Probability function P only defined for a smaller family of events
Such family of events is called σ-algebra
σ-algebras
Definition: sigma-algebra
Let B be a collection of events. We say that B is a σ-algebra if
∅∈B
If A∈B then Ac∈B
If A1,A2,…∈B then ∪i=1∞Ai∈B
Remarks:
Since ∅∈B and ∅c=Ω, we deduce that Ω∈B
Thanks to DeMorgan’s Law we have that A1,A2,…∈B⟹∩i=1∞Ai∈B
σ-algebras
Examples
Suppose Ω is any set:
Then B={∅,Ω} is a σ-algebra
The power set of ΩB=Power(Ω):={A:A⊂Ω} is a σ-algebra
σ-algebras
Examples
If Ω has n elements then B=Power(Ω) contains 2n sets
If Ω={1,2,3} then B=Power(Ω)={{1},{2},{3}{1,2},{2,3},{1,3}∅,{1,2,3}}
If Ω is uncountable then the power set of Ω is not easy to describe
Lebesgue σ-algebra
Question
R is uncountable. Which σ-algebra do we consider?
Definition: Lebesgue sigma-algebra
The Lebesgue σ-algebra on R is the smallest σ-algebra L containing all sets of the form (a,b),(a,b],[a,b),[a,b] for all a,b∈R
Lebesgue σ-algebra
Important
Therefore the events of R are
Intervals
Unions and intersection of intervals
Countable Unions and intersection of intervals
Warning
I only told you that the Lebsesgue σ-algebra Lexists
Explicitly showing that L exists is not easy, see [7]
Probability measure
Suppose given:
Ω sample space
B a σ-algebra on Ω
Definition: Probability measure
A probability measure on Ω is a map P:B→[0,1] such that the Axioms of Probability hold
P(Ω)=1
If A1,A2,… are pairwise disjoint then P(⋃i=1∞Ai)=∑i=1∞P(Ai)
Properties of Probability
Let A,B∈B. As a consequence of the Axioms of Probability:
P(∅)=0
If A and B are disjoint then P(A∪B)=P(A)+P(B)
P(Ac)=1−P(A)
P(A)=P(A∩B)+P(A∩Bc)
P(A∪B)=P(A)+P(B)−P(A∩B)
If A⊂B then P(A)≤P(B)
Properties of Probability
Suppose A is an event and B1,B2,… a partition of Ω. Then P(A)=i=1∑∞P(A∩Bi)
Suppose A1,A2,… are events. Then P(i=1⋃∞Ai)≤i=1∑∞P(Ai)
Example: Fair Coin Toss
The sample space for coin toss is Ω={H,T}
We take as σ-algebra the power set of ΩB={∅,{H},{T},{H,T}}
We suppose that the coin is fair
This means P:B→[0,1] satisfies P({H})=P({T})
Assuming the above we get 1=P(Ω)=P({H}∪{T})=P({H})+P({T})=2P({H})
Therefore P({H})=P({T})=21
Conditional Probability
Definition: Conditional Probability
Let A,B be events in Ω with P(B)>0 The conditional probability of A given B is P(A∣B):=P(B)P(A∩B)
Conditional Probability
Intuition
The conditional probability P(A∣B)=P(B)P(A∩B) represents the probability of A, knowing that B has happened:
If B has happened, then B is the new sample space
Therefore A∩Bc cannot happen, and we are only interested in A∩B
Hence it makes sense to define P(A∣B)∝P(A∩B)
We divide P(A∩B) by P(B) so that P(A∣B)∈[0,1] is still a probability
The function A↦P(A∣B) is a probability measure on Ω
Bayes’ Rule
For two events A and B is holds
P(A∣B)=P(B∣A)P(B)P(A)
Given a partition A1,A2,… of the sample space we have
P(Ai∣B)=∑j=1∞P(B∣Aj)P(Aj)P(B∣Ai)P(Ai)
Independence
Definition
Two events A and B are independent if P(A∩B)=P(A)P(B) A collection of events A1,…,An are mutually independent if for any subcollection Ai1,…,Aik it holds P(j=1⋂kAj)=j=1∏kP(Aij)
Random Variables
Motivation
Consider the experiment of flipping a coin 50 times
The sample space consists of 250 elements
Elements are vectors of 50 entries recording the outcome H or T of each flip
This is a very large sample space!
Suppose we are only interested in X= number of H in 50flips
Then the new sample space is the set of integers {0,1,2,…,50}
This is much smaller!
X is called a Random Variable
Random Variables
Assume given
Ω sample space
B a σ-algebra of events on Ω
P:B→[0,1] a probability measure
Definition: Random variable
A function X:Ω→R
We will abbreviate Random Variable with rv
Random Variables
Technical remark
Definition: Random variable
A measurable function X:Ω→R
Technicality: X is a measurable function if {X∈I}:={ω∈Ω:X(ω)∈I}∈B,∀I∈L where
L is the Lebsgue σ-algebra on R
B is the given σ-algebra on Ω
Random Variables
Notation
In particular I∈L can be of the form (a,b),(a,b],[a,b),[a,b],∀a,b∈R
In this case the set {X∈I}∈B is denoted by, respectively: {a<X<b},{a<X≤b},{a≤X<b},{a≤X≤b}
If a=b=x then I=[x,x]={x}. Then we denote {X∈I}={X=x}
Distribution
Why do we require measurability?
Answer: Because it allows to define a new probability measure on R
Definition: Distribution
The distribution of a random variable X:Ω→R is the probability measure on RPX:L→[0,1],PX(I):=P({X∈I}),∀I∈L
Note:
One can show that PX satisfies the Probability Axioms
Thus PX is a probability measure on R
In the future we will denote P(X∈I):=P({X∈I})
Distribution
Why is the distribution useful?
Answer: Because it allows to define a random variable X
by specifying the distribution values P(X∈I)
rather than defining an explicit function X:Ω→R
Important: More often than not
We care about the distribution of X
We do not care about how X is defined
Example - Three coin tosses
Sample space Ω given by the below values of ω
ω
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
The probability of each outcome is the same P(ω)=21×21×21=81,∀ω∈Ω
Define the random variable X:Ω→R by X(ω):= Number of H in ω
ω
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
X(ω)
3
2
2
2
1
1
1
0
Example - Three coin tosses
Recall the definition of X
ω
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
X(ω)
3
2
2
2
1
1
1
0
The range of X is {0,1,2,3}
Hence the only interesting values of PX are P(X=0),P(X=1),P(X=2),P(X=3)
Example - Three coin tosses
Recall the definition of X
ω
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
X(ω)
3
2
2
2
1
1
1
0
We compute P(X=0)P(X=1)P(X=2)P(X=3)=P(TTT)=81=P(TTH)+P(THT)+P(HTT)=83=P(HHT)+P(HTH)+P(THH)=83=P(HHH)=81
Example - Three coin tosses
Recall the definition of X
ω
HHH
HHT
HTH
THH
TTH
THT
HTT
TTT
X(ω)
3
2
2
2
1
1
1
0
The distribution of X is summarized in the table below
x
0
1
2
3
P(X=x)
81
83
83
81
Cumulative Distribution Function
Recall: The distribution of a rv X:Ω→R is the probability measure on RPX:L→[0,1],PX(I):=P(X∈I),∀I∈L
Definition: cdf
The cumulative distribution function or cdf of a rv X:Ω→R is FX:R→R,FX(x):=PX(X≤x)
Cumulative Distribution Function
Intuition
FX is the primitive of PX:
Recall from Analysis: The primitive of a continuous function g:R→R is G(x):=∫−∞xg(y)dy
Note that PX is not a function but a distribution
However the definition of cdf as a primitive still makes sense
PX will be the derivative of FX - In a suitable generalized sense
Recall from Analysis: Fundamental Theorem of Calculus says G′(x)=g(x)
Since FX is the primitive of PX, it will still hold FX′=PX in the sense of distributions
Distribution Function
Example
Consider again 3 coin tosses and the rv X(ω):= Number of H in ω
We computed that the distribution PX of X is
x
0
1
2
3
P(X=x)
81
83
83
81
One can compute FX(x)=⎩⎨⎧08121871if x<0if 0≤x<1if 1≤x<2if 2≤x<3if 3≤x
For example FX(2.1)=P(X≤2.1)=P(X=0,1 or 2)=P(X=0)+P(X=1)+P(X=2)=81+83+83=87
Cumulative Distribution Function
Example
Plot of FX: it is a step function
FX′=0 except at x=0,1,2,3
FX jumps at x=0,1,2,3
Size of jump at x is P(X=x)
FX′=PX in the sense of distributions (Advanced analysis concept - not covered)
Discrete Random Variables
In the previous example:
The cdf FX had jumps
Hence FX was discountinuous
We take this as definition of discrete rv
Definition
X:Ω→R is discrete if FX has jumps
Probability Mass Function
In this slide X is a discrete rv
Therefore FX has jumps
Definition
The Probability Mass Function or pmf of a discrete rv X is fX:R→R,fX(x):=P(X=x)
Consider again 3 coin tosses and the RV X(ω):= Number of H in ω
The pmf of X is fX(x):=P(X=x), which we have already computed
x
0
1
2
3
fX(x)=P(X=x)
81
83
83
81
Example 2 - Geometric Distribution
Suppose p∈(0,1) is a given probability of success
Hence 1−p is probability of failure
Consider the random variable X= Number of attempts to obtain first success
Since each trial is independent, the pmf of X is fX(x)=P(X=x)=(1−p)x−1p,∀x∈N
This is called geometric distribution
Example 2 - Geometric Distribution
We want to compute the cdf of X: For x∈N with x>0FX(x)=P(X≤x)=k=1∑xP(X=k)=k=1∑xfX(k)=k=1∑x(1−p)k−1p=1−(1−p)1−(1−p)xp=1−(1−p)x where we used the formula for the sum of geometric series: k=1∑xtk−1=1−t1−tx,t=1
Example 2 - Geometric Distribution
FX is flat between two consecutive natural numbers: FX(x+k)=P(X≤x+k)=P(X≤x)=FX(x) for all x∈N,k∈[0,1)
Therefore FX has jumps and X is discrete
Continuous Random Variables
Recall: X is discrete if FX has jumps
Definition: Continuous Random Variable
X:Ω→R is continuous if FX is continuous
Probability Mass Function?
Suppose X is a continuous rv
Therefore FX is continuous
Question
Can we define the Probability Mass Function for X?
Answer:
Yes we can, but it would be useless - pmf carries no information
This is because fX(x)=P(X=x)=0,∀x∈R
Probability Mass Function?
Indeed, for all ε>0 we have {X=x}⊂{x−ε<X≤x}
Therefore by the properties of probabilities we have P(X=x)≤P(x−ε<X≤x)=P(X≤x)−P(X≤x−ε)=FX(x)−FX(x−ε) where we also used the definition of FX
Since FX is continuous we get 0≤P(X=x)≤ε→0limFX(x)−FX(x−ε)=0
Then fX(x)=P(X=x)=0 for all x∈R
Probability Density Function
pmf carries no information for continuous RV
We instead define the pdf
Definition
The Probability Density Function or pdf of a continuous rv X is a function fX:R→R s.t. FX(x)=∫−∞xfX(t)dt,∀x∈R
Technical issue:
If X is continuous then pdf does not exist in general
Counterexamples are rare, therefore we will assume existence of pdf
Probability Density Function
Properties
Proposition
Suppose X is continuous rv. They hold
The cdfFX is continuous and differentiable (a.e.) with FX′=fX
Probability can be computed via P(a≤X≤b)=∫abfX(t)dt,∀a,b∈R,a≤b
Example - Logistic Distribution
The random variable X has logistic distribution if its pdf is fX(x)=(1+e−x)2e−x
Example - Logistic Distribution
The random variable X has logistic distribution if its pdf is fX(x)=(1+e−x)2e−x
The cdf can be computed to be FX(x)=∫−∞xfX(t)dt=1+e−x1
The RHS is known as logistic function
Example - Logistic Distribution
Application: Logistic function models expected score in chess (see Wikipedia)
RA is ELO rating of player A, RB is ELO rating of player B
EA is expected score of player A: EA:=P(A wins)+21P(A draws)
EA modelled by logistic function EA:=1+10(RB−RA)/4001
Example: Beginner is rated 1000, International Master is rated 2400RBegin=1000,RIM=2400,EBegin=1+101400/4001=0.00031612779
Characterization of pmf and pdf
Theorem
Let f:R→R. Then f is pmf or pdf of a RV X iff
f(x)≥0 for all x∈R
∑x=−∞∞f(x)=1 (pmf) or ∫−∞∞f(x)dx=1 (pdf)
In the above setting:
The RV X has distributionP(X=x)=f(x) (pmf) or P(a≤X≤b)=∫abf(t)dt (pdf)
The symbol X∼f denotes that X has distribution f
Summary - Random Variables
Suppose X:Ω→R is RV
Cumulative Density Function (cdf): FX(x):=P(X≤x)
Discrete RV
Continuous RV
FX has jumps
FX is continuous
Probability Mass Function (pmf)
Probability Density Function (pdf)
fX(x):=P(X=x)
fX(x):=FX′(x)
fX≥0
fX≥0
∑x=−∞∞fX(x)=1
∫−∞∞fX(x)dx=1
FX(x)=∑k=−∞xfX(k)
FX(x)=∫−∞xfX(t)dt
P(a≤X≤b)=∑k=abfX(k)
P(a≤X≤b)=∫abfX(t)dt
Part 4: Moment generating functions
Functions of Random Variables
X:Ω→R random variable and g:R→R function
Then Y:=g(X):Ω→R is random variable
For A⊂R we define the pre-image g−1(A):={x∈R:g(x)∈A}
For A={y} single element set we denote g−1({y})=g−1(y)={x∈R:g(x)=y}
The distribution of Y is P(Y∈A)=P(g(X)∈A)=P(X∈g−1(A))
Functions of Random Variables
Question: What is the relationship between fX and fY?
X discrete: Then Y is discrete and fY(y)=P(Y=y)=x∈g−1(y)∑P(X=x)=x∈g−1(y)∑fX(x)
X and Y continuous: Then FY(y)=P(Y≤y)=P(g(X)≤y)=P({x∈R:g(x)≤y})=∫{x∈R:g(x)≤y}fX(t)dt
Functions of Random Variables
Issue: The below set may be tricky to compute {x∈R:g(x)≤y}
However it can be easily computed if g is strictly monotone:
g strictly increasing: Meaning that x1<x2⟹g(x1)<g(x2)
g strictly decreasing: Meaning that x1<x2⟹g(x1)>g(x2)
X discrete: Then Y is discrete and fY(y)=x∈g−1(y)∑fX(x)
X and Y continuous: Then FY(y)=∫{x∈R:g(x)≤y}fX(t)dt
X and Y continuous and
g strictly increasing: FY(y)=FX(g−1(y))
g strictly decreasing: FY(y)=1−FX(g−1(y))
Expected Value
Expected value is the average value of a random variable
Definition
X rv and g:R→R function. The expected value or mean of g(X) is IE[g(X)]
If X discrete IE[g(X)]:=x∈R∑g(x)fX(x)=x∈R∑g(x)P(X=x)
If X continuous IE[g(X)]:=∫−∞∞g(x)fX(x)dx
Expected Value
Properties
In particular we have1
If X discrete IE[X]=x∈R∑xfX(x)=x∈R∑xP(X=x)
If X continuous IE[X]=∫−∞∞xfX(x)dx
Expected Value
Properties
Theorem
X rv, g,h:R→R functions and a,b,c∈R. The expected value is linearIE[ag(X)+bh(X)+c]=aIE[g(X)]+bIE[h(X)]+c(1) In particular IE[aX]IE[c]=aIE[X]=c(2)(3)
Expected Value
Proof of Theorem
Equation (2) follows from (1) by setting g(x)=x and b=c=0
Equation (3) follows from (1) by setting a=b=0
To show (1), suppose X is continuous and set p(x):=ag(x)+bh(x)+cIE[ag(X)+bh(X)+c]=IE[p(X)]=∫Rp(x)fX(x)dx=∫R(ag(x)+bh(x)+c)fX(x)dx=a∫Rg(x)fX(x)dx+b∫Rh(x)fX(x)dx+c∫RfX(x)dx=aIE[g(X)]+bIE[h(X)]+c
If X is discrete just replace integrals with series in the above argument
Expected Value
Further Properties
Below are further properties of IE, which we do not prove
Theorem
Suppose X and Y are rv. The expected value is:
Monotone: X≤Y⟹IE[X]≤IE[Y]
Non-degenerate: IE[∣X∣]=0⟹X=0
X=Y⟹IE[X]=IE[Y]
Variance
Variance measures how much a rv X deviates from IE[X]
Definition: Variance
The variance of a random variable X is Var[X]:=IE[(X−IE[X])2]
Proof: Using linearity of IE and the fact that IE[c]=c for constants: Var[aX+b]=IE[(aX+b)2]−IE[aX+b]2=IE[a2X2+b2+2abX]−(aIE[X]+b)2=a2IE[X2]+b2+2abIE[X]−a2IE[X]2−b2−2abIE[X]=a2(IE[X2]−IE[X]2)=a2Var[X]
Variance
How to compute the Variance
We have Var[X]=IE[X2]−IE[X]2
X discrete: E[X]=x∈R∑xfX(x),E[X2]=x∈R∑x2fX(x)
X continuous: E[X]=∫−∞∞xfX(x)dx,E[X2]=∫−∞∞x2fX(x)dx
Example - Gamma distribution
Definition
The Gamma distribution with parameters α,β>0 is f(x):=Γ(α)xα−1e−βxβα,x>0 where Γ is the Gamma functionΓ(a):=∫0∞xa−1e−xdx
Example - Gamma distribution
Definition
Properties of Γ:
The Gamma function coincides with the factorial on natural numbers Γ(n)=(n−1)!,∀n∈N
More in general Γ(a)=(a−1)Γ(a−1),∀a>0
Definition of Γ implies normalization of the Gamma distribution: ∫0∞f(x)dx=∫0∞Γ(α)xα−1e−βxβαdx=1
Example - Gamma distribution
Definition
X has Gamma distribution with parameters α,β if
the pdf of X is fX(x)=⎩⎨⎧Γ(α)xα−1e−βxβα0 if x>0 if x≤0
In this case we write X∼Γ(α,β)
α is shape parameter
β is rate parameter
Example - Gamma distribution
Plot
Plotting Γ(α,β) for parameters (2,1) and (3,2)
Example - Gamma distribution
Expected value
Let X∼Γ(α,β). We have: IE[X]=∫−∞∞xfX(x)dx=∫0∞xΓ(α)xα−1e−βxβαdx=Γ(α)βα∫0∞xαe−βxdx
Example - Gamma distribution
Expected value
Recall previous calculation: IE[X]=Γ(α)βα∫0∞xαe−βxdx Change variable y=βx and recall definition of Γ: ∫0∞xαe−βxdx=∫0∞βα1(βx)αe−βxβ1βdx=βα+11∫0∞yαe−ydy=βα+11Γ(α+1)
Recall previous calculation: IE[X2]=Γ(α)βα∫0∞xα+1e−βxdx Change variable y=βx and recall definition of Γ: ∫0∞xα+1e−βxdx=∫0∞βα+11(βx)α+1e−βxβ1βdx=βα+21∫0∞yα+1e−ydy=βα+21Γ(α+2)
Example - Gamma distribution
Variance
Therefore IE[X2]=Γ(α)βα∫0∞xα+1e−βxdx=Γ(α)βαβα+21Γ(α+2)=β2Γ(α)Γ(α+2) Now use following formula twice Γ(α+1)=αΓ(α): Γ(α+2)=(α+1)Γ(α+1)=(α+1)αΓ(α) Substituting we get IE[X2]=β2Γ(α)Γ(α+2)=β2(α+1)α
Example - Gamma distribution
Variance
Therefore IE[X]=βαIE[X2]=β2(α+1)α and the variance is Var[X]=IE[X2]−IE[X]2=β2(α+1)α−β2α2=β2α
Moment generating function
We abbreviate Moment generating function with MGF
MGF is almost the Laplace transform of the probability density function
MGF provides a short-cut to calculating mean and variance
MGF gives a way of proving distributional results for sums of independent random variables
Moment generating function
Definition
The moment generating function or MGF of a rv X is MX(t):=IE[etX],∀t∈R
In particular we have:
X discrete: MX(t)=x∈R∑etxfX(x)
X continuous: MX(t)=∫−∞∞etxfX(x)dx
Moment generating function
Computing moments
Theorem
If X has MGF MX then IE[Xn]=MX(n)(0) where we denote MX(n)(0):=dtndnMX(n)(t)t=0
The quantity IE[Xn] is called n-th moment of X
Moment generating function
Proof of Theorem
Suppose X continuous and that we can exchange derivative and integral: dtdMX(t)=dtd∫−∞∞etxfX(x)dx=∫−∞∞(dtdetx)fX(x)dx=∫−∞∞xetxfX(x)dx=IE(XetX) Evaluating at t=0: dtdMX(t)t=0=IE(Xe0)=IE[X]
Moment generating function
Proof of Theorem
Proceeding by induction we obtain: dtndnMX(t)=IE(XnetX) Evaluating at t=0 yields the thesis: dtndnMX(t)t=0=IE(Xne0)=IE[Xn]
Moment generating function
Notation
For the first 3 derivatives we use special notations:
The normal distribution with mean μ and variance σ2 is f(x):=2πσ21exp(−2σ2(x−μ)2),x∈R
X has normal distribution with mean μ and variance σ2 if fX=f
In this case we write X∼N(μ,σ2)
The standard normal distribution is denoted N(0,1)
Example - Normal distribution
Plot
Plotting N(μ,σ2) for parameters (0,1) and (3,2)
Example - Normal distribution
Moment generating function
The equation for the normal pdf is fX(x)=2πσ21exp(−2σ2(x−μ)2) Being pdf, we must have ∫fX(x)dx=1. This yields: ∫−∞∞exp(−2σ2x2+σ2μx)dx=exp(2σ2μ2)2πσ(1)
Example - Normal distribution
Moment generating function
We have MX(t):=IE(etX)=∫−∞∞etxfX(x)dx=∫−∞∞etx2πσ1exp(−2σ2(x−μ)2)dx=2πσ1∫−∞∞etxexp(−2σ2x2−2σ2μ2+σ2xμ)dx=exp(−2σ2μ2)2πσ1∫−∞∞exp(−2σ2x2+σ2(tσ2+μ)x)dx
Example - Normal distribution
Moment generating function
We have shown MX(t)=exp(−2σ2μ2)2πσ1∫−∞∞exp(−2σ2x2+σ2(tσ2+μ)x)dx(2) Replacing μ by (tσ2+μ) in (1) we obtain ∫−∞∞exp(−2σ2x2+σ2(tσ2+μ)x)dx=exp(2σ2(tσ2+μ)2)2πσ1(3) Substituting (3) in (2) and simplifying we get MX(t)=exp(μt+2t2σ2)
Example - Normal distribution
Mean
Recall the mgf MX(t)=exp(μt+2t2σ2) The first derivative is MX′(t)=(μ+σ2t)exp(μt+2t2σ2) Therefore the mean: IE[X]=MX′(0)=μ
Example - Normal distribution
Variance
The first derivative of mgf is MX′(t)=(μ+σ2t)exp(μt+2t2σ2) The second derivative is then MX′′(t)=σ2exp(μt+2t2σ2)+(μ+σ2t)2exp(μt+2t2σ2) Therefore the second moment is: IE[X2]=MX′′(0)=σ2+μ2
Example - Normal distribution
Variance
We have seen that: IE[X]=μIE[X2]=σ2+μ2 Therefore the variance is: Var[X]=IE[X2]−IE[X]2=σ2+μ2−μ2=σ2
Example - Gamma distribution
Moment generating function
Suppose X∼Γ(α,β). This means fX(x)=⎩⎨⎧Γ(α)xα−1e−βxβα0 if x>0 if x≤0
We have seen already that IE[X]=βαVar[X]=β2α
We want to compute mgf MX to derive again IE[X] and Var[X]
Example - Gamma distribution
Moment generating function
We compute MX(t)=IE[etX]=∫−∞∞etxfX(x)dx=∫0∞etxΓ(α)xα−1e−βxβαdx=Γ(α)βα∫0∞xα−1e−(β−t)xdx
Example - Gamma distribution
Moment generating function
From the previous slide we have MX(t)=Γ(α)βα∫0∞xα−1e−(β−t)xdx Change variable y=(β−t)x and recall the definition of Γ: ∫0∞xα−1e−(β−t)xdx=∫0∞(β−t)α−11[(β−t)x]α−1e−(β−t)x(β−t)1(β−t)dx=(β−t)α1∫0∞yα−1e−ydy=(β−t)α1Γ(α)
From the mgf MX(t)=(β−t)αβα we compute the first derivative: MX′(t)=dtd[βα(β−t)−α]=βα(−α)(β−t)−α−1(−1)=αβα(β−t)−α−1
Example - Gamma distribution
Expectation
From the first derivative MX′(t)=αβα(β−t)−α−1 we compute the expectation IE[X]=MX′(0)=αβα(β)−α−1=βα
Example - Gamma distribution
Variance
From the first derivative MX′(t)=αβα(β−t)−α−1 we compute the second derivative MX′′(t)=dtd[αβα(β−t)−α−1]=αβα(−α−1)(β−t)−α−2(−1)=α(α+1)βα(β−t)−α−2
Example - Gamma distribution
Variance
From the second derivative MX′′(t)=α(α+1)βα(β−t)−α−2 we compute the second moment: IE[X2]=MX′′(0)=α(α+1)βα(β)−α−2=β2α(α+1)
Example - Gamma distribution
Variance
From the first and second moments: IE[X]=βαIE[X2]=β2α(α+1) we can compute the variance Var[X]=IE[X2]−IE[X]2=β2α(α+1)−β2α2=β2α
Moment generating function
The mgf characterizes a distribution
Theorem
Let X and Y be random variables with mgfs MX and MY respectively. Assume there exists ε>0 such that MX(t)=MY(t),∀t∈(−ε,ε) Then X and Y have the same cdf FX(u)=FY(u),∀x∈R
In other words: same mgf⟹same distribution
Example
Suppose X is a random variable such that MX(t)=exp(μt+2t2σ2) As the above is the mgf of a normal distribution, by the previous Theorem we infer X∼N(μ,σ2)
Suppose Y is a random variable such that MY(t)=(β−t)αβα As the above is the mgf of a Gamma distribution, by the previous Theorem we infer Y∼Γ(α,β)
References
[1]
Bingham, Nicholas H., Fry, John M., Regression, linear models in statistics, Springer, 2010.
[2]
Fry, John M., Burke, Matt, Quantitative methods in finance using R, Open University Press, 2022.
[3]
Casella, George, Berger, Roger L., Statistical inference, second edition, Brooks/Cole, 2002.
[4]
DeGroot, Morris H., Schervish, Mark J., Probability and statistics, Fourth Edition, Addison-Wesley, 2012.
[5]
Dalgaard, Peter, Introductory statistics with R, Second Edition, Springer, 2008.
[6]
Davies, Tilman M., The book of R, No Starch Press, 2016.
[7]
Rosenthal, Jeffrey S., A first look at rigorous probability theory, Second Edition, World Scientific Publishing, 2006.