Maximum likelihood estimation

개요

최대 우도 추정(MLE: Maximum Likelihood Estimation) 방법은 주어진 샘플 x에 대해 우도를 가장 크게 해 주는 모수 θ를 찾는 방법이다.

방법

어떤 모수 [math]\displaystyle{ \theta }[/math]로 결정되는 확률변수들의 모임 [math]\displaystyle{ D_\theta = (X_1, X_2, \cdots, X_n) }[/math]이 있고, [math]\displaystyle{ D_\theta }[/math]의 확률 밀도 함수나 확률 질량 함수가 [math]\displaystyle{ f }[/math]이고, 그 확률변수들에서 각각 값 [math]\displaystyle{ x_1, x_2, \cdots, x_n }[/math]을 얻었을 경우, 가능도 [math]\displaystyle{ \mathcal{L}(\theta) }[/math]는 다음과 같다.

[math]\displaystyle{ \mathcal{L}(\theta) = f_{\theta}(x_1, x_2, \cdots, x_n) }[/math]

여기에서 가능도를 최대로 만드는 [math]\displaystyle{ \theta }[/math]는

[math]\displaystyle{ \widehat{\theta} = \underset{\theta}{\operatorname{argmax}}\ \mathcal{L}(\theta) }[/math]

가 된다.

이때 [math]\displaystyle{ X_1, X_2, \cdots, X_n }[/math]이 모두 독립적이고 같은 확률분포를 가지고 있다면, [math]\displaystyle{ \mathcal{L} }[/math]은 다음과 같이 표현이 가능하다.

[math]\displaystyle{ \mathcal{L}(\theta) = \prod_i f_{\theta}(x_i) }[/math]

또한, 로그함수는 단조 증가하므로, [math]\displaystyle{ \mathcal{L} }[/math]에 로그를 씌운 값의 최댓값은 원래 값 [math]\displaystyle{ \widehat{\theta} }[/math]과 같고, 이 경우 계산이 비교적 간단해진다.

[math]\displaystyle{ \mathcal{L}^*(\theta) = \log \mathcal{L}(\theta) = \sum_i \log f_{\theta}(x_i) }[/math]

예제: 가우스 분포

평균 [math]\displaystyle{ \mu }[/math]와 분산 [math]\displaystyle{ \sigma^2 }[/math]의 값을 모르는 정규분포에서 [math]\displaystyle{ x_1, x_2, \cdots, x_n }[/math]의 값을 표집하였을 때, 이 값들을 이용하여 원래 분포의 평균과 분산을 추측한다. 이 경우 구해야 하는 모수는 [math]\displaystyle{ \theta = (\mu, \sigma) }[/math]이다. 정규분포의 확률 밀도 함수가

[math]\displaystyle{ f_{\mu, \sigma}(x_i) = \frac{1}{\sqrt{2 \pi} \sigma} \exp(\frac{-(x_i - \mu)^2}{2 \sigma^2}) }[/math]

이고, [math]\displaystyle{ x_1, x_2, \cdots, x_n }[/math]가 모두 독립이므로

[math]\displaystyle{ \mathcal{L}(\theta) = \prod_i f_{\mu, \sigma}(x_i) = \prod_i \frac{1}{\sqrt{2 \pi} \sigma} \exp(\frac{-(x_i - \mu)^2}{2 \sigma^2}) }[/math]

양변에 로그를 씌우면

[math]\displaystyle{ \mathcal{L}^*(\theta) = -\frac{n}{2} \log{2\pi} - n \log \sigma - \frac{1}{2 \sigma^2} \sum_i {(x_i - \mu)^2} }[/math]

가 된다. 식의 값을 최대화하는 모수를 찾기 위해, 양변을 [math]\displaystyle{ \mu }[/math]로 각각 편미분하여 0이 되는 값을 찾는다.

[math]\displaystyle{ \frac{\partial}{\partial \mu} \mathcal{L}^*(\theta) = \frac{1}{\sigma^2} \sum_i (x_i - \mu) }[/math]

[math]\displaystyle{ = \frac{1}{\sigma^2} (\sum_i x_i - n \mu) }[/math]

따라서 이 식을 0으로 만드는 값은 [math]\displaystyle{ \widehat \mu = (\sum_i x_i) / n }[/math]으로, 즉 표집한 값들의 평균이 된다. 마찬가지 방법으로 양변을 [math]\displaystyle{ \sigma }[/math]로 편미분하면

[math]\displaystyle{ \frac{\partial}{\partial \sigma} \mathcal{L}^*(\theta) = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_i (x_i - \mu)^2 }[/math]

따라서 이 식을 0으로 만드는 값은 다음과 같다.

[math]\displaystyle{ \sigma^2 = \sum_i (x_i - \mu)^2 / n }[/math]