Suppose that we have a random sample from a population of interest. We may have a theoretical model for the way that the population is distributed. However, there may be several population parameters of which we do not know the values. Maximum likelihood estimation is one way to determine these unknown parameters.

The basic idea behind maximum likelihood estimation is that we determine the values of these unknown parameters. We do this in such a way to maximize an associated joint probability density function or probability mass function. We will see this in more detail in what follows. Then we will calculate some examples of maximum likelihood estimation.

### Steps for Maximum Likelihood Estimation

The above discussion can be summarized by the following steps:

- Start with a sample of independent random variables X
_{1}, X_{2}, . . . X_{n}from a common distribution each with probability density function f(x;θ_{1}, . . .θ_{k}). The thetas are unknown parameters. - Since our sample is independent, the probability of obtaining the specific sample that we observe is found by multiplying our probabilities together. This gives us a likelihood function L(θ
_{1}, . . .θ_{k}) = f( x_{1};θ_{1}, . . .θ_{k}) f( x_{2};θ_{1}, . . .θ_{k}) . . . f( x_{n};θ_{1}, . . .θ_{k}) = Π f( x_{i};θ_{1}, . . .θ_{k}). - Next we use Calculus to find the values of theta that maximize our likelihood function L.
- More specifically, we differentiate the likelihood function L with respect to θ if there is a single parameter. If there are multiple parameters we calculate partial derivatives of L with respect to each of the theta parameters.
- To continue the process of maximization, set the derivative of L (or partial derivatives) equal to zero and solve for theta.
- We can then use other techniques (such as a second derivative test) to verify that we have found a maximum for our likelihood function.

### Example

Suppose we have a package of seeds, each of which has a constant probability *p *of success of germination. We plant *n *of these and count the number of those that sprout. Assume that each seed sprouts independently of the others. ow do we determine the maximum likelihood estimator of the parameter *p*?

We begin by noting that each seed is modeled by a Bernoulli distribution with a success of *p. *We let *X* be either 0 or 1, and the probability mass function for a single seed is *f*( x ; *p *) = *p ^{x} *(1 -

*p*)

^{1 - x}.

Our sample consists of *n *different *X _{i}*, each of with has a Bernoulli distribution. The seeds that sprout have

*X*= 1 and the seeds that fail to sprout have

_{i}*X*= 0.

_{i }The likelihood function is given by:

L (* p *) = Π *p ^{xi} *(1 -

*p*)

^{1 - }

^{xi}We see that it is possible to rewrite the likelihood function by using the laws of exponents.

L (* p *) = *p ^{Σ xi} *(1 -

*p*)

^{n - }

^{Σ xi}Next we differentiate this function with respect to *p*. We assume that the values for all of the *X _{i }*are known, and hence are constant. To differentiate the likelihood function we need to use the product rule along with the power rule:

L' (* p *) = Σ x_{i}*p*^{-1 +Σ xi} (1 - *p*)^{n - }* ^{Σ xi} *- (

*n*-

*Σ x*(1 -

_{i})p^{Σ xi}*p*)

^{n-1 - }

^{Σ xi}We rewrite some of the negative exponents and have:

L' (* p *) = (1/*p*) Σ x_{i}p^{Σ xi} (1 - *p*)^{n - }* ^{Σ xi} *- 1/(1 -

*p*) (

*n*-

*Σ x*(1 -

_{i})p^{Σ xi}*p*)

^{n - }

^{Σ xi}= [(1/*p*) Σ x_{i }- 1/(1 - *p*) (*n* - *Σ x _{i}*)]

_{i}p

^{Σ xi}(1 -

*p*)

^{n - }

^{Σ xi}_{ }

Now, in order to continue the process of maximization, we set this derivative equal to zero and solve for *p:*

0 = [(1/*p*) Σ x_{i }- 1/(1 - *p*) (*n* - *Σ x _{i}*)]

_{i}p

^{Σ xi}(1 -

*p*)

^{n - }

^{Σ xi}Since *p *and (1- *p*) are nonzero we have that

0 = (1/*p*) Σ x_{i }- 1/(1 - *p*) (*n* - *Σ x _{i}*).

Multiplying both sides of the equation by *p*(1- *p*) gives us:

0 = (1 - *p*) Σ x_{i }- *p* (*n* - *Σ x _{i}*).

We expand the right hand side and see:

0 = Σ x_{i }- *p* Σ x_{i }- *p* *n* + *p**Σ x _{i}* = Σ x

_{i }-

*p*

*n*.

Thus Σ x_{i }= *p* *n* and (1/n)Σ x_{i }= p. This means that the maximum likelihood estimator of *p *is a sample mean. More specifically this is the sample proportion of the seeds that germinated. This is perfectly in line with what intuition would tell us. In order to determine the proportion of seeds that will germinate, first consider a sample from the population of interest.

### Modifications to the Steps

There are some modifications to the above list of steps. For example, it as we have seen above, is typically worthwhile to spend some time using some algebra to simplify the expression of the likelihood function. The reason for this is to make the differentiation easier to carry out.

Another change to the above list of steps is to consider natural logarithms. The maximum for the function L will occur at the same point as it will for the natural logarithm of L. Thus maximizing ln L is equivalent to maximizing the function L.

Many times, due to the presence of exponential functions in L, taking the natural logarithm of L will greatly simplify some of our work.

### Example

We see how to use the natural logarithm by revisiting the example from above. We begin with the likelihood function:

L (* p *) = *p ^{Σ xi} *(1 -

*p*)

^{n - }

^{Σ xi}.We then use our logarithm laws and see that:

R(* p *) = ln L(* p* ) = Σ x_{i }ln *p +* (*n* - *Σ x _{i}*) ln(1 -

*p*).

We already see that the derivative is much easier to calculate:

R'( *p *) = (1/*p*)Σ x_{i }- 1/(1 - *p*)(*n* - *Σ x _{i}*) .

Now, as before, we set this derivative equal to zero and multiply both sides by *p *(1 - *p*):

0 = (1-* p* ) Σ x_{i }- *p*(*n* - *Σ x _{i}*) .

We solve for *p *and find the same result as before.

The use of the natural logarithm of L(p) is helpful in another way. It is much easier to calculate a second derivative of R(p) to verify that we truly do have a maximum at the point (1/n)Σ x_{i }= p.

### Example

For another example, suppose that we have a random sample X_{1}, X_{2}, . . . X_{n} from a population that we are modelling with an exponential distribution. The probability density function for one random variable is of the form *f*( *x *) = θ^{-}^{1} *e *^{-x}^{/θ}

The likelihood function is given by the joint probability density function. This is a product of several of these density functions:

L(θ) = Π θ^{-}^{1} *e *^{-xi}^{/θ }= θ^{-n}*e ^{-}^{Σ} ^{xi}*

^{/θ}

Once again it is helpful to consider the natural logarithm of the likelihood function. Differentiating this will require less work than differentiating the likelihood function:

R(θ) = ln L(θ) = ln [θ^{-n}*e ^{-}^{Σ} ^{xi}*

^{/θ}]

We use our laws of logarithms and obtain:

R(θ) = ln L(θ) = - *n* ln θ* + -*Σ*x _{i}/θ*

We differentiate with respect to θ and have:

R'(θ) = - *n* / θ* + *Σ*x _{i}/θ*

^{2}

Set this derivative equal to zero and we see that:

0 = - *n* / θ* + *Σ*x _{i}/θ*

^{2}.

Multiply both sides by *θ*^{2 }and the result is:

0 = - *n* θ* + *Σ*x _{i}*.

Now use algebra to solve for θ:

θ = (1/n)Σ*x _{i}*.

We see from this that the sample mean is what maximizes the likelihood function. The parameter θ to fit our model should simply be the mean of all of our observations.

Connections

There are other types of estimators. One alternate type of estimation is called an unbiased estimator. For this type, we must calculate the expected value of our statistic and determine if it matches a corresponding parameter.