1970年1月1日
2086 Lecture 3 Estimation And Maximum Likelihood
No description yet.
Lecture Note: Lecture 3 Notes.pdf
Previous: 2086 Lecture 2 - Expectation, Variance and Probability Distributions Next: 2086 Lecture 4 - Central Limit Theorem and Confidence Intervals
(hat): add a hat on any variable mean the estimated result of the variable, e.g is the estimation of
Sum of the squared error:
This represent that the sum of the squared error of every y value to the mean. We can us this to find the estimated mean
The smaller the squared error is, means the closer the point to SSE measure that how close the is to the samples. Since mean should be the point that have relatively closer to all sample points, then to find the best , we need to make the SSE smallest.
Hence we this is the equation of . This estimated mean can also be the best guess of the mean of real data, because we randomly choose samples from the real data, so then mean of sample should be close to mean of real data, especially when sample size goes larger.
Maximum Likelihood Estimation (MLE)
To find the best estimating parameters of probability distribution, consider we have a samples y where y = {}.
stands for the probability distribution we need
MLE is basically find the best probability distribution use parameters that makes the probability of y observed in distribution maximum
For example, If we flip a coin 5 times, we get result y={1,1,1,0,0}
Then we have the = {0.1, 0.2, 0.6, 1}
After using MLE, our best estimated parameter will be 0.6, because under Bernoulli distribution, 0.6 is the parameters that have highest probability to observe the result of y
Since all in must be iid, so = Which can be denote as:
We can also use negative-log likelihood instead of distribution likelihood, the function is:
Where:
We can also rewrite this equation using log(ab) = log a + log b:
We know that the turning point of a function is local maximum or minimum, and hence the first derivative of the function is 0 at the point.
We can find the using:
Estimator
Point estimation: estimator like MLE give us a specific value of parameter denote as
Sampling distribution
When we us MLE to get the sample mean on every single different samples, we get different sample mean, although they are close to actual mean, but we can not measure that how accurate it is.
To find the closeness of our sample mean, we first assume population is normal distributed, where
We also have Samples
Here is the law of normal distribution
This means that if we add the samples from population together, we will get a combined normal distribution where
and we can then device this by n, we have
This indicate that when n, the sample size goes greater, the less the variance, which means there is less randomness in
Analyzing the Estimator
To find our how well our Estimator is, we have 4 different metrics which are bias, variance, mean squared error and consistency
Bias
This is the equation for bias of a estimator:
bias = Expectation of estimated parameter minus the actual parameter
In the other word, it measures the difference between average estimated parameter and actual parameter
If b>0, then expected estimated value is greater than actual one, then we are over-estimated
If b<0, then expected estimated value is smaller than actual one, then we are under-estimated
If b=0 means it match, then this estimator is unbiased
Variance
This is the formula of calculating variance of estimator, Basically same as the variance formula of random variable, but using estimated parameter instead.
Mean Squared Error(MSE)
MSE will measure how well our estimator do the estimating. represent the Squared Error, the distance from estimated parameter and actual parameter. Expectation simply just give use the expected or average squared error.
We can also wrote this formula using bias and variance:
Consistency
Consistency tells us that whether a predictor is consistent among samples.
It states that when , then if estimator is consistent, estimator does not have the systematic error, and also doesn’t have random variance overall.