1970年1月1日
2086 Lecture 6 Linear Regression
No description yet.
Lecture Note: Lecture 6 Notes.pdf
Previous: 2086 Lecture 5 - Hypothesis Testing Next: 2086 Lecture 7 - Classification and Logistic Regression
Linear Regression
Linear regression describe the relation between predicted outcome and it’s factor , the factor also called as predictor or regressor.
Single Linear Regression
The Single Linear Regression formula includes the coefficients and the intercept
is the intercept, showing what predicted outcome is when predictor
We also use residual error to show how good the model fits on this data
Multiple Linear Regression
The Multiple Linear Regression will have multiple predictor . We will also have this number of coefficients, the formula of multiple linear regression is:
is still the intercept, showing what predicted outcome is when all predictors
Least-square
Similar to minimize SSE for estimator, Linear Regression have it’s own way of measuring how good the model fits across the samples.
RSS, Residual sum of square represent the goodness of fit
It represent the sum of squared difference between predicted value and the real sample value. The less the RSS, the more accurate our model is.
The principle of Least-Square states that if we want to find the best model, then we are finding the model that minimize the RSS, i.e.
Score
We can use RSS to judging how good our model is, the smaller the better. But this may involves a problem, RSS is unit aware metric. lets say we measure a person’s height, actual height is 184cm, the predicted value is 180cm.
Then the RSS in cm will be . RSS in mm will be which does not really meaningful
They way to solve it is measuring a ratio of how good the RSS is comparing to the Sum of Mean Squared Error, we call it TSS, total sum-of-squares
This state a baseline, where we only use sample mean
The value is then describe as
This represent a Ratio of how good the RSS(using model) is comparing to TSS(with out using model). If is close to 0 means RSS are close to TSS, then we can say this model does not describe our data, not accurate at all. If is close to 1 means RSS is very much smaller than TSS, and close to 0, our model fits the data well.
Choosing Predictor
Overfitting and Underfitting
Overfitting of a Linear regression model means our model is lack of generalization, model learn too much information from the training data, include randomness of training data, which includes a lot of noise and make the model can’t predict new data correctly
Underfitting of a Linear regression models mean our model did not learn enough information, model can’t make correct prediction with out learn enough rules about the data
Hypothesis testing to remove predictor
We can also use the hypothesis testing to do such process.
lets say we want to check whether predictor is important or not, then we should check it’s coefficient
If = 0 means is not important, we use this null hypothesis to find a p-value, then using to judge whether is important.
Information Criteria
Beside from using hypothesis testing, we can use a formula called information criteria:
Where represent the minimized negative log-likelihood estimated model, this represent the parameter with maximized likelihood estimation.
Usually when we maximizing the likelihood, it will overfit, because we include as much predictors as possible. So we need , the complexity penalty here, the more the predictors , the higher the penalty if we using AIC as complexity penalty equation, where