1970年1月1日
2086 Lecture 8 Model Selection And Penalized Regression
No description yet.
Lecture Note: Lecture 8 Notes.pdf
Previous: 2086 Lecture 7 - Classification and Logistic Regression Next: 2086 Lecture 9 - Trees and Nearest Neighbour Methods
Underfitting, Overfitting and MSPE
- Underfitting: model too simple, misses real signal, high bias.
- Overfitting: model too complex, learns noise, poor generalization.
For test data , mean-squared prediction error:
Expected prediction error can be decomposed as:
Goal of model selection: trade off bias and variance to minimize prediction error.
Selecting Predictors
Hypothesis testing and multiple testing
For each predictor:
If we test many predictors, false positives increase.
Bonferroni correction uses threshold:
where is number of tests.
Information criteria
Use negative log-likelihood plus complexity penalty:
Common forms:
is number of predictors in model .
Cross-validation (CV)
Core idea: simulate future prediction with repeated train/test splits.
Basic steps:
- Split data into training and testing sets.
- Fit model on training set.
- Evaluate prediction error on testing set.
- Repeat and average errors.
Common variants:
- -fold CV (usually ),
- Leave-one-out CV (LOO CV).
Penalized Regression
Instead of hard include/exclude decisions, shrink coefficients:
- controls penalty strength.
- : close to least squares.
- larger : stronger shrinkage, lower complexity.
- usually do not penalize intercept .
Predictors should be standardized before penalization:
This framework also extends to logistic regression (penalized likelihood).
Ridge regression
Strengths:
- very stable,
- handles correlated predictors well,
- computationally efficient.
Weakness:
- coefficients are shrunk but usually not exactly zero (no direct variable selection).
Lasso regression
Strengths:
- stable,
- can set some coefficients exactly to zero,
- performs variable selection + estimation together.
Weaknesses:
- may bias large coefficients downward,
- can be less robust than ridge under strong predictor correlation,
- may still overfit in some real datasets.
Choosing
Standard approach:
- Define a grid of values.
- Use CV to estimate error for each .
- Choose with smallest CV error.
- Refit final model on all data using this .
Bias-Variance View of Penalization
Least squares can have low bias but high variance.
Penalization introduces some bias but can greatly reduce variance.
A good reduces total prediction error by this trade-off.