Maximum Likelihood Estimators - Matlab example
As a motivation, let us look at one Matlab example. Let us generate a random sample of size 100 from beta distribution Beta(5, 2). We will learn the definition of beta distribution later, at this point we only need to know that this is a continuous distribution on the interval [0, 1]. This can be done by typing ’X=betarnd(5,2,100,1)’. Let us fit different distributions by using a distribution fitting tool ’dfittool’.
Besides the graphs, the distribution fitting tool outputs the following information:
Distribution: Normal
Log likelihood: 55.2571
Domain: -Inf < y < Inf
Mean: 0.742119
Variance: 0.0195845
Parameter Estimate Std. Err.
mu 0.742119 0.0139945
sigma 0.139945 0.00997064
Estimated covariance of parameter estimates:
mu sigma
mu 0.000195845 6.01523e-020
sigma 6.01523e-020 9.94136e-005
Distribution: Beta
Log likelihood: 63.8445
Domain: 0 < y < 1
Mean: 0.741371
Variance: 0.0184152
Parameter Estimate Std. Err.
a 6.97783 1.08827
b 2.43424 0.378351
Estimated covariance of parameter estimates:
a b
a 1.18433 0.370094
b 0.370094 0.143149
The value ’Log likelihood’ indicates that the tool uses the maximum likelihood estimators to fit the distribution, which will be the topic of the next few lectures. Notice the ’Parameter estimates’ - given the data ’dfittool’ estimates the unknown parameters of the distribution and then graphs the p.d.f. or c.d.f. corresponding to these parameters.
One can ask several questions about this example:
1. How to estimate the unknown parameters of a distribution given the data from this distribution?
2. How good are these estimates, are they close to the actual ’true’ parameters?
3. Does data come from a particular type of distribution, for example, normal or beta distribution?
Wikipedia article about normal distribution gives a reference to a 1932 book ”Problems of Relative Growth” by Julian Huxley for the explanation why the sizes of full-grown animals are approximately log-normal. One short explanation is consistency between linear and volume dimensions - if linear dimensions are log-normal and volume dimensions are proportional to cube of linear dimensions then they also are log-normal. Assumption that sizes are normal would violate this consistency, since the cube of normal is not normal. We observe, however, that the fit of women’s waist with log-normal is not very accurate. Later in the class we will learn several statistical tests to decide if the data comes from a certain distribution or a family of distributions, but here is a preview of what’s to come. Chi-squared goodness-of-fit test rejects the hypothesis that the distribution of logarithms of women’s waists is normal:
[h,p,stats]=chi2gof(log_women_waist)
h = 1, p = 5.2297e-004
stats = chi2stat: 22.0027
df: 5
edges: [1x9 double]
O: [21 44 67 60 28 18 12 10]
E: [1x8 double]
and so does Lilliefor’s test (adjusted Kolmogorov-Smirnov test):
[h,p,stats]=lillietest(log_women_waist)
h = 1, p = 0, stats = 0.0841.
The same tests accept the hypotheses that other variables have log-normal distribution.
As we can see, Gamma fits the data better than log-normal and much better than normal. To find the parameters of fitted Gamma distribution we use Matlab ’gamfit’ function:
param=gamfit(women_waist_shift)
param = 2.8700 4.4960.
Chi-squared goodness-of-fit test for a specific (fitted) Gamma distribution:
[h,p,stats]=chi2gof(women_waist_shift,’cdf’,@(z)gamcdf(z,param(1),param(2)))
h = 0, p = 0.9289, stats = chi2stat: 2.4763, df: 7
accepts the hypothesis that the sample has Gamma distribution (2.87, 4.496). This test is not ’accurate’ in some sense, which will be explained later. One can also check that Gamma distribution fits well other variables - men’s waist girth, weight of men and weight of women.
Uniform distribution U[0, ϕ] on the interval [0, ϕ].
This distribution has p.d.f. , if ; otherwise is 0. The likelihood function
Here the indicator function I(A) equals to 1 if event A happens and 0 otherwise. What the indicator above means is that the likelihood will be equal to 0 if at least one of the factors is 0 and this will happen if at least one observation will fall outside of the ’allowed’ interval [0, ϕ]. Another way to say it is that the maximum among observations will exceed ϕ, i.e.
if ; and if .
Sometimes it is not so easy to find the maximum of the likelihood function as in example above and one might have to do it numerically. Also, MLE does not always exist. The difference is that we ’excluded’ the point ϕ by setting . Then the likelihood function is
,
and the maximum at the point is not achieved. Of course, this is an artificial example that shows that sometimes one needs to be careful.
References:
[1] Grete Heinz, Louis J. Peterson, Roger W. Johnson, Carter J. Kerk, (2003) “Exploring Relationships in Body Dimensions“. Journal of Statistics Education, Volume 11, Number 2.