Warm up: from last times slides:
A random sample of \(n = 25\) Corvallis residents had an average IQ score of 104. Assume a population variance of \(\sigma^2 = 225\). What’s the mean IQ for Corvallis residents? Is it plausible the mean for Corvallis residents is greater than 100?
Find point estimate, Z-stat and p-value, and 95% confidence interval
qnorm(0.975)
= 1.96
pnorm(1.33)
= 0.9082409
Finish last time’s slides
t-tests
Inference for a population mean
What do we do if we don’t know \(\sigma^2\)? Realistically this is always the case
We can estimate \(\sigma^2\), just like we estimated \(\mu\):
- We used the sample mean to estimate the population mean
- We can use the sample variance to estimate the population variance
Sample variance
The sample variance for a sample \(Y_1, \ldots, Y_n\) is: \[ s^2 = \frac{1}{n-1}\sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 \]
Facts about the sampling distribution of the sample variance:
- The mean is \(\sigma^2\), i.e. \(s^2\) is an unbiased estimate of \(\sigma^2\)
- As the sample size \(n\) gets larger, \(s^2\) gets closer and closer to the true population variance \(\sigma^2\), i.e. \(s^2\) is consistent estimate of \(\sigma^2\)
t-statistic
If we replace \(\sigma^2\) with \(s^2\) in the Z-statistic for testing \(H_0: \mu = \mu_0\), we get a t-statistic:
\[ Z(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{\sigma^2/n}} \rightarrow t(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{s^2/n}} \]
Reference distribution
We compared the Z-statistic to N\((0, 1)\)
- Why? N\((0, 1)\) is the distribution we expect for \(Z\), when the null hypothesis is true
What should we compare a t-statistic to?
- \(s^2\) will sometimes be smaller than \(\sigma^2\), sometimes bigger
- Introduces additional variability into our test statistic
t-distribution
The null distribution for a t-statistic is the t-distribution.
the t-distribution is a family of distributions defined by a single parameter the degrees of freedom
Notation:
- \(t_(1)\) - t-distribution with 1 degree of freedom
- \(t_(3)\) - t-distribution with 3 degrees of freedom
- \(t_(v)\) - t-distribution with \(v\) degree of freedom
Looks a lot like a Standard normal but with heavier tails and sharper peak.
t-distribution
As \(v \rightarrow \infty\) \(t_{(v)}\) approaches the Standard Normal density.
t-test: Inference for population mean
If the population is exactly Normal:
- \(\overline{Y}\) exactly Normal.
- t-statistic is exactly a t-distribution with \(n-1\) degrees of freedom
If population is anything with finite variance:
- \(\overline{Y}\) approximately Normal,
- t-statistic approximately t-distribution with \(n-1\) d.f.
t-test: Inference for population mean
Rather than coming from a Standard Normal:
- Rejection region critical values come from t-distribution quantiles
- CI multipliers come from t-distribution quantiles
- P-values come from the cumulative distribution function of the t-distribution
In R: pt(q, df)
, qt(p, df)
, dt(x, df)
t-test: Summary
Data Setting One sample, no explanatory variable \(Y_1, \ldots, Y_n\) i.i.d from population with unknown variance \(\sigma^2\)
Null hypothesis \(H_0: \mu = \mu_0\)
Test statistic \[ t(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{s^2/n}} \]
Reference distribution \(t(\mu_0) \dot \sim t_{(n-1)}\)
t-test: Summary
Rejection Region for level \(\alpha\) test
One sided \(H_A: \mu < \mu_0\) | Two sided \(H_A: \mu \ne \mu_0\) | One sided \(H_A: \mu > \mu_0\) |
---|---|---|
\(t(\mu_0) < t_{(n-1)\alpha}\) | \(|t(\mu_0)| > t_{(n-1)1-\alpha/2}\) | \(t(\mu_0) > t_{(n-1)1 - \alpha}\) |
\(t_{(n-1) \alpha}\) = qt(alpha, df = n - 1)
t-test: Summary
p-values given an observed \(t(\mu_0) = t\)
One sided \(H_A: \mu < \mu_0\) | Two sided \(H_A: \mu \ne \mu_0\) | One sided \(H_A: \mu > \mu_0\) |
---|---|---|
\(F_t(t; n-1)\) | \(2\left(1 - F_t(|t|; n-1)\right)\) | \(1 - F_t(t; n-1)\) |
\(F_t(t; n-1)\) = pt(t, df = n-1)
Confidence Intervals \((1-\alpha)100\%\)
\[ \left( \overline{Y} - t_{(n-1)1 - \alpha/2} \, \sqrt{\frac{s^2}{n}}, \, \overline{Y} + t_{(n-1)1 - \alpha/2} \, \sqrt{\frac{s^2}{n}} \right) \]
Standard error
\[ Var(\overline{Y}) = \frac{\sigma^2 }{n} \] We estimated \(\sigma^2\) with \(s^2\), hence estimate \(Var(\overline{Y})\) with
\[ \widehat{Var}(\overline{Y}) = \frac{s^2}{n} \]
Square root of this, often called, standard error of the mean:
\[ \text{SE}(\overline{Y}) = \frac{s}{\sqrt{n}} \]
In general standard error refers to the estimated standard deviation of an estimator.
Next time
Population proportions (a special case of population means)