Central Limit Theorem ST551 Lecture 6

Central Limit Theorem (CLT)

If the population distribution of a variable X has population mean μ and (finite) population variance σ2, then the sampling distribution of the sample mean becomes closer and closer to a Normal distribution as the sample size n increases.

We can write: X¯˙N(μ,σ2n) for large values of n, where the symbol ˙ means approximately distributed as.

We already figured out the mean and variance, the CLT adds the shape information

Central Limit Theorem (CLT)

Slightly more formally..

Let X1,X2,,Xn be an i.i.d. sample from some population distribution F with mean μ and variance σ2<. Then as the sample size n we have

X¯μσDN(0,1)

Your Turn

Consider the following population distribution.

The following three histograms represent:

  1. A sample of size 1000 from the population

  2. 1000 sample means of samples of size 10 from the population

  3. 1000 sample means of samples of size 100 from the population

Which is which?

How big is big enough?

What value of n is big enough for the approximation to be good enough?

It depends on the population

How big is big enough?

What value of n is big enough for the approximation to be good enough?

It depends on the population

See more in lab…

Your turn

Using the CLT to approximation the sampling distribution

Population: (μ=20,σ2=4)
Sample: n=16 i.i.d from population
Sample statistic: Sample mean

What is the approximate distribution for Y¯?

Applying CLT

Same setup: we have sample of size, n=16 from a population with population mean μ=20 and population variance σ2=4.

What is the probability the sample mean is less than 20.5?

CLT says:

Using a Standard Normal Probability Table

To use a Standard Normal table, we first need to transform our random variable (Y¯) to a Standard Normal.

We can convert a probability for Y¯ to a standard Normal probability by:

  1. Subtracting the mean of Y¯ from both sides, then
  2. Dividing both sides by the standard deviation of Y¯ (square root of the variance)

P(Y¯20.5)=

where ZN(0,1)

Using a Standard Normal Probability Table

Now look up z=1.00 in Standard Normal Table.

P(Y¯20.5)=0.8413

Using a Standard Normal Probability Table

What if z is negative?

Standard Normal is symmetric around zero.

P(Zz)=1P(Zz)

Density

Density: height of probability density function at x: f(x;μ,σ)

In R: dnorm(x, mean = mu, sd = sigma)

Cumulative Probability

Cumulative Probability: the area under the probability density function to the right of x: F(x;μ,σ)

In R: pnorm(q, mean = mu, sd = sigma)

Area to left?

Quantiles

Quantile: The pth quantile is the value x, such that the area to the left of x under the density function is p. F1(p;μ=0,σ2=1)

qnorm(p, mean = mu, sd = sigma)

What should these return?

qnorm(0, mean = 0, sd = 1)

qnorm(1, mean = 0, sd = 1)

dnorm(-Inf)

dnorm(Inf)

pnorm(Inf)

pnorm(-Inf)

Exercise: Find probability

Return to example: μ=20 σ2=4, n=16

What is P(Y¯<20.5)? What is P(Y¯<21)

Exercise: Find sample size for specific variance in sample mean

μ=20 σ2=4.

What should n be so Var(Y¯)=0.5?

Exercise: Find sample size for an interval with desired probability

What should n be, so that P(19.5<Y¯<20.5)=0.9?