Home About Projects Blog Graph Resume Contact 中文
Back to list

1970年1月1日

2086 Lecture 4 Central Limit Theorem And Confidence Intervals

No description yet.

Lecture Note CLT: Lecture 4 Notes (Part I).pdf Lecture Note CI: Lecture 4 Notes (Part II).pdf

Previous: 2086 Lecture 3 - Estimation and Maximum Likelihood Next: 2086 Lecture 5 - Hypothesis Testing

Central limit theorem (CLT)

Central limit theorem (CLT) is the most important theory in statistics. It states that no matter what probability distribution that the population is, it can be Binomial, Uniform or Bernoulli. When we draw multiple samples, and calculate sample means for them. The distribution of sample means will approach to normal distribution when sample size increase.

Fact 1 (Central Limit Theorem): Let Y1,,YnY_1, \ldots, Y_n be random variables (RVs) and i.i.d with E[Yi]=μ\mathbb{E}[Y_i] = \mu and V[Yi]=σ2\mathbb{V}[Y_i] = \sigma^2. Then

i=1nYidN(nμ,nσ2)\sum_{i=1}^n Y_i \stackrel{d}{\to} N(n\mu, n\sigma^2)

We know that sample mean Y^=i=1nYin\hat{Y} = \frac{\sum_{i=1}^n Y_i}{n} Based on the fact of Variance and Expectations: E[cY]=cE[Y]\mathbb{E}[cY] = c\mathbb{E}[Y], V[cY]=c2V[Y]\mathbb{V}[cY] = c^2\mathbb{V}[Y] We can rewrite CLT fact as what we expected, where c=1nc = \frac{1}{n}:

Y^dN(μ,σ2n)\hat{Y} \stackrel{d}{\to} N(\mu, \frac{\sigma^2}{n})

The greater the sample size nn, the less the variance is.

Interval Estimating

Point estimating will return the best guess of the estimator, which may not cover enough cases as our sample size is limited. So rather than give a best guess, we return a interval of estimator where it covers the most of the possible result

This can be denote as:

T(y)=(θ^(y),θ^+(y))RT(\mathbf{y}) = \left( \hat{\theta}^{-}(\mathbf{y}), \, \hat{\theta}^{+}(\mathbf{y}) \right) \subset \mathbb{R}

The method we use to get such a interval is called confidence intervals

Confidence Interval (CI)

Confidence Interval, denote as T(y)T(\mathbf{y}). We say that T(y)T(\mathbf{y}) is a 100(1α)%100(1 - \alpha)\% confidence interval when:

P(θT(y))=1α,\mathbb{P}(\theta \in T(\mathbf{y})) = 1 - \alpha,

This means that when we have a 100(1α)%100(1 - \alpha)\% confidence interval, then if we generate many different 95%CI on different samples from population. About 100(1α)%100(1 - \alpha)\% of them will include real parameter θ\theta .

CI for Normal Mean with Known Variance

The formula of calculating CI with known Variance is:

(μ^zα/2σ2n,  μ^+zα/2σ2n)\left( \hat{\mu} - z_{\alpha/2} \sqrt{\frac{\sigma^2}{n}}, \; \hat{\mu} + z_{\alpha/2} \sqrt{\frac{\sigma^2}{n}} \right)

Where zα/2z_{\alpha/2} can be calculated using z-table, we find the line where p(Z>z) equals to 1α/21 - \alpha /2 and then read the value of Z

CI for Normal Mean with Unknown Variance

The formula of calculating CI with known Variance is:

(μ^Aμ^Bzα/2σA2nA+σB2nB,  μ^Aμ^B+zα/2σA2nA+σB2nB)\left( \hat{\mu}_A - \hat{\mu}_B - z_{\alpha/2} \sqrt{\frac{\sigma_A^2}{n_A} + \frac{\sigma_B^2}{n_B}}, \; \hat{\mu}_A - \hat{\mu}_B + z_{\alpha/2} \sqrt{\frac{\sigma_A^2}{n_A} + \frac{\sigma_B^2}{n_B}} \right)

Backlinks