1970年1月1日

2086 Lecture 9 Trees And Nearest Neighbour Methods

No description yet.

Previous: 2086 Lecture 8 - Model Selection and Penalized Regression

Machine Learning and CV Revisited

Use cross-validation to choose model complexity parameter $\gamma$ (e.g. number of leaves, regularization strength).
General $K$ -fold CV idea:

split into $K$ folds,
train on $K-1$ folds, test on held-out fold,
repeat over folds and repetitions,
choose complexity with smallest average prediction error.

Decision Trees

[!NOTE] Other Unit In FIT3152, we also mentioned deeper content at 3152 Lecture 6 - Decision Tree

A decision tree recursively splits predictor space into disjoint regions:

R_1,\dots,R_L,\qquad R_i\cap R_j=\varnothing\ (i\neq j)

Each leaf stores a simple local model:

regression: average/normal model in leaf,
classification: class probability (e.g. Bernoulli parameter).

Tree complexity is mainly controlled by number of leaves $L$ .

Forward growing (greedy)

Start with root node.
Try candidate splits on predictors.
Choose split with best score improvement.
Repeat until no useful split.

Common scoring idea: negative log-likelihood (or information criterion / CV score).

Splitting criterion for binary targets

In a leaf with $n_1$ ones and $n_0$ zeros:

p(y\mid\theta)=\theta^{n_1}(1-\theta)^{n_0},\qquad \hat\theta=\frac{n_1}{n}

Minimized negative log-likelihood:

-\log p(y\mid\hat\theta) =-n_1\log\left(\frac{n_1}{n}\right)-n_0\log\left(\frac{n_0}{n}\right)

Purity interpretation:

highest uncertainty around $n_1/n=0.5$ ,
pure leaf ( $n_1=0$ or $n_1=n$ ) gives lower loss.

For numeric predictors, split by threshold:

x_j\le c \quad \text{vs}\quad x_j>c

choose $c$ that gives best score.

Trees + CV (grow then prune)

Typical practice:

Grow a large overfitted tree.
Prune back to candidate sizes $L=1,\dots,L_{\max}$ .
Use CV to choose best $L$ .
Fit final tree with chosen $L$ on full data.

Tree strengths / weaknesses

Strengths:

interpretable,
handles nonlinearities and interactions naturally,
works with continuous and categorical predictors,
embedded variable selection (unused variables are excluded).

Weaknesses:

unstable (small data perturbations can change tree),
search space is huge (greedy procedure may miss global optimum),
can be inefficient for simple linear relationships.

Random Forests

Random forest = ensemble of many trees.

Training idea

Grow $q$ trees.
At each split, only a random subset of predictors is considered.
This reduces correlation between trees and mitigates greedy instability.

Prediction

Regression: average predictions across trees.
Classification: average class probabilities (or majority vote).

Why it works:

single tree: low bias, high variance;
averaging many trees: keeps low bias, reduces variance.

Pros / Cons

Pros:

strong predictive accuracy,
much more stable than one tree,
handles nonlinear and mixed-type data well.

Cons:

much less interpretable than a single tree,
many hyperparameters and algorithm choices.

k-Nearest Neighbours (k-NN)

k-NN is a non-parametric method: it does not fit an explicit global model.

Given training pairs $(x_i,y_i)$ and new point $x'$ :

Compute distances

d_i=d(x_i,x'),\quad i=1,\dots,n

Sort by distance.
Take $k$ nearest targets $y_{(1)},\dots,y_{(k)}$ .
Aggregate:

\hat y'=f\!\left(y_{(1)},\dots,y_{(k)}\right)

For regression (simple average):

\hat y'=\frac{1}{k}\sum_{i=1}^k y_{(i)}

For classification: majority vote (or averaged probabilities).

Distance and weighting

Common distance: Euclidean

d(x,x')=\left(\sum_{j=1}^p(x_j-x'_j)^2\right)^{1/2}

Can use weighted aggregation so closer neighbours have larger weights.

Practical tuning

Need to tune:

neighborhood size $k$ ,
distance metric,
weighting/kernel function.

Standard approach: use CV (often LOO-CV) to choose settings with smallest prediction error.

k-NN strengths / weaknesses

Strengths:

conceptually simple,
weak assumptions,
flexible with suitable distance functions.

Weaknesses:

many tuning choices,
variable selection is not automatic,
no interpretability (no explicit model parameters).

反向链接

2086 Lecture 8 Model Selection And Penalized Regression

No description yet.

3152 Lecture 6

Decision Tree

3152 Lecture 8

Ensemble Model and Artificial Neural Networks