## Central Limit Theorem (CLT)

If the population distribution of a variable \(X\) has population mean \(\mu\) and (finite) population variance \(\sigma^2\), then the *sampling distribution of the sample mean* becomes closer and closer to a Normal distribution as the sample size n increases.

We can write: \[
\overline{X} \, \dot \sim \, N\left(\mu, \frac{\sigma^2}{n}\right)
\] for large values of \(n\), where the symbol \(\dot \sim\) means *approximately distributed as*.

*We already figured out the mean and variance, the CLT adds the shape information*

## Central Limit Theorem (CLT)

Slightly more formally..

Let \(X_1, X_2, \ldots, X_n\) be an i.i.d. sample from some population distribution \(F\) with mean \(\mu\) and variance \(\sigma^2 < \infty\). Then as the sample size \(n \rightarrow \infty\) we have

\[ \frac{\overline{X} - \mu}{\sigma} \rightarrow_D N\left(0,1\right) \]

## Your Turn

Consider the following population distribution.

The following three histograms represent:

A sample of size 1000 from the population

1000 sample means of samples of size 10 from the population

1000 sample means of samples of size 100 from the population

## Which is which?

## How big is big enough?

What value of \(n\) is *big enough* for the approximation to be *good enough*?

**It depends on the population**

## How big is big enough?

What value of \(n\) is *big enough* for the approximation to be *good enough*?

**It depends on the population**

**See more in labâ€¦**

## Your turn

Using the CLT to approximation the sampling distribution

**Population:** \(\sim (\mu = 20, \sigma^2 = 4)\)

**Sample:** \(n = 16\) i.i.d from population

**Sample statistic:** Sample mean

What is the approximate distribution for \(\overline{Y}\)?

## Applying CLT

Same setup: we have sample of size, \(n = 16\) from a population with population mean \(\mu = 20\) and population variance \(\sigma^2 = 4\).

**What is the probability the sample mean is less than 20.5?**

CLT says:

## Using a Standard Normal Probability Table

To use a Standard Normal table, we first need to transform our random variable (\(\overline{Y}\)) to a Standard Normal.

We can convert a probability for \(\overline{Y}\) to a standard Normal probability by:

- Subtracting the mean of \(\overline{Y}\) from both sides, then
- Dividing both sides by the standard deviation of \(\overline{Y}\) (square root of the variance)

\[ \begin{aligned} P( \overline{Y} \le 20.5 ) &= \end{aligned} \]

\[ \text{where } Z \sim N(0, 1) \]

## Using a Standard Normal Probability Table

Now look up \(z = 1.00\) in Standard Normal Table.

\(P(\overline{Y} \le 20.5) = 0.8413\)

## Using a Standard Normal Probability Table

**What if \(z\) is negative?**

Standard Normal is symmetric around zero.

\(P(Z \le -z) = 1 - P(Z \le z)\)

## Density

Density: height of probability density function at x: \(f(x; \mu, \sigma)\)

In R: `dnorm(x, mean = mu, sd = sigma)`

## Cumulative Probability

Cumulative Probability: the area under the probability density function to the right of x: \(F(x; \mu, \sigma)\)

In R: `pnorm(q, mean = mu, sd = sigma)`

Area to left?

## Quantiles

Quantile: The \(p\)th quantile is the value \(x\), such that the area to the left of \(x\) under the density function is \(p\). \(F^{-1}(p; \mu = 0, \sigma^2 = 1)\)

`qnorm(p, mean = mu, sd = sigma)`

## What should these return?

`qnorm(0, mean = 0, sd = 1)`

`qnorm(1, mean = 0, sd = 1)`

`dnorm(-Inf)`

`dnorm(Inf)`

`pnorm(Inf)`

`pnorm(-Inf)`

## Exercise: Find probability

Return to example: \(\mu = 20\) \(\sigma^2 = 4\), \(n = 16\)

**What is \(P(\overline{Y} < 20.5)\)? What is \(P(\overline{Y} < 21)\)**

## Exercise: Find sample size for specific variance in sample mean

\(\mu = 20\) \(\sigma^2 = 4\).

**What should \(n\) be so \(Var(\overline{Y}) = 0.5\)?**

## Exercise: Find sample size for an interval with desired probability

**What should \(n\) be, so that \(P(19.5 < \overline{Y} < 20.5) = 0.9\)?**