1970年1月1日
2086 Lecture 9 Trees And Nearest Neighbour Methods
No description yet.
Lecture Note: Lecture 9 Notes.pdf
Previous: 2086 Lecture 8 - Model Selection and Penalized Regression
Machine Learning and CV Revisited
Use cross-validation to choose model complexity parameter (e.g. number of leaves, regularization strength).
General -fold CV idea:
- split into folds,
- train on folds, test on held-out fold,
- repeat over folds and repetitions,
- choose complexity with smallest average prediction error.
Decision Trees
[!NOTE] Other Unit In FIT3152, we also mentioned deeper content at 3152 Lecture 6 - Decision Tree
A decision tree recursively splits predictor space into disjoint regions:
Each leaf stores a simple local model:
- regression: average/normal model in leaf,
- classification: class probability (e.g. Bernoulli parameter).
Tree complexity is mainly controlled by number of leaves .
Forward growing (greedy)
- Start with root node.
- Try candidate splits on predictors.
- Choose split with best score improvement.
- Repeat until no useful split.
Common scoring idea: negative log-likelihood (or information criterion / CV score).
Splitting criterion for binary targets
In a leaf with ones and zeros:
Minimized negative log-likelihood:
Purity interpretation:
- highest uncertainty around ,
- pure leaf ( or ) gives lower loss.
For numeric predictors, split by threshold:
choose that gives best score.
Trees + CV (grow then prune)
Typical practice:
- Grow a large overfitted tree.
- Prune back to candidate sizes .
- Use CV to choose best .
- Fit final tree with chosen on full data.
Tree strengths / weaknesses
Strengths:
- interpretable,
- handles nonlinearities and interactions naturally,
- works with continuous and categorical predictors,
- embedded variable selection (unused variables are excluded).
Weaknesses:
- unstable (small data perturbations can change tree),
- search space is huge (greedy procedure may miss global optimum),
- can be inefficient for simple linear relationships.
Random Forests
Random forest = ensemble of many trees.
Training idea
- Grow trees.
- At each split, only a random subset of predictors is considered.
- This reduces correlation between trees and mitigates greedy instability.
Prediction
- Regression: average predictions across trees.
- Classification: average class probabilities (or majority vote).
Why it works:
- single tree: low bias, high variance;
- averaging many trees: keeps low bias, reduces variance.
Pros / Cons
Pros:
- strong predictive accuracy,
- much more stable than one tree,
- handles nonlinear and mixed-type data well.
Cons:
- much less interpretable than a single tree,
- many hyperparameters and algorithm choices.
k-Nearest Neighbours (k-NN)
k-NN is a non-parametric method: it does not fit an explicit global model.
Given training pairs and new point :
- Compute distances
- Sort by distance.
- Take nearest targets .
- Aggregate:
For regression (simple average):
For classification: majority vote (or averaged probabilities).
Distance and weighting
Common distance: Euclidean
Can use weighted aggregation so closer neighbours have larger weights.
Practical tuning
Need to tune:
- neighborhood size ,
- distance metric,
- weighting/kernel function.
Standard approach: use CV (often LOO-CV) to choose settings with smallest prediction error.
k-NN strengths / weaknesses
Strengths:
- conceptually simple,
- weak assumptions,
- flexible with suitable distance functions.
Weaknesses:
- many tuning choices,
- variable selection is not automatic,
- no interpretability (no explicit model parameters).