T-tests ST551 Lecture 10

Warm up: from last times slides:

A random sample of \(n = 25\) Corvallis residents had an average IQ score of 104. Assume a population variance of \(\sigma^2 = 225\). What’s the mean IQ for Corvallis residents? Is it plausible the mean for Corvallis residents is greater than 100?

Find point estimate, Z-stat and p-value, and 95% confidence interval

qnorm(0.975) = 1.96
pnorm(1.33) = 0.9082409

Finish last time’s slides

t-tests

Inference for a population mean

What do we do if we don’t know \(\sigma^2\)? Realistically this is always the case

We can estimate \(\sigma^2\), just like we estimated \(\mu\):

  • We used the sample mean to estimate the population mean
  • We can use the sample variance to estimate the population variance

Sample variance

The sample variance for a sample \(Y_1, \ldots, Y_n\) is: \[ s^2 = \frac{1}{n-1}\sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2 \]

Facts about the sampling distribution of the sample variance:

  • The mean is \(\sigma^2\), i.e. \(s^2\) is an unbiased estimate of \(\sigma^2\)
  • As the sample size \(n\) gets larger, \(s^2\) gets closer and closer to the true population variance \(\sigma^2\), i.e. \(s^2\) is consistent estimate of \(\sigma^2\)

t-statistic

If we replace \(\sigma^2\) with \(s^2\) in the Z-statistic for testing \(H_0: \mu = \mu_0\), we get a t-statistic:

\[ Z(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{\sigma^2/n}} \rightarrow t(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{s^2/n}} \]

Reference distribution

We compared the Z-statistic to N\((0, 1)\)

  • Why? N\((0, 1)\) is the distribution we expect for \(Z\), when the null hypothesis is true

What should we compare a t-statistic to?

  • \(s^2\) will sometimes be smaller than \(\sigma^2\), sometimes bigger
  • Introduces additional variability into our test statistic

t-distribution

The null distribution for a t-statistic is the t-distribution.

the t-distribution is a family of distributions defined by a single parameter the degrees of freedom

Notation:

  • \(t_(1)\) - t-distribution with 1 degree of freedom
  • \(t_(3)\) - t-distribution with 3 degrees of freedom
  • \(t_(v)\) - t-distribution with \(v\) degree of freedom

Looks a lot like a Standard normal but with heavier tails and sharper peak.

t-distribution

As \(v \rightarrow \infty\) \(t_{(v)}\) approaches the Standard Normal density.

t-test: Inference for population mean

  • If the population is exactly Normal:

    • \(\overline{Y}\) exactly Normal.
    • t-statistic is exactly a t-distribution with \(n-1\) degrees of freedom
  • If population is anything with finite variance:

    • \(\overline{Y}\) approximately Normal,
    • t-statistic approximately t-distribution with \(n-1\) d.f.

t-test: Inference for population mean

Rather than coming from a Standard Normal:

  • Rejection region critical values come from t-distribution quantiles
  • CI multipliers come from t-distribution quantiles
  • P-values come from the cumulative distribution function of the t-distribution

In R: pt(q, df), qt(p, df), dt(x, df)

t-test: Summary

Data Setting One sample, no explanatory variable \(Y_1, \ldots, Y_n\) i.i.d from population with unknown variance \(\sigma^2\)

Null hypothesis \(H_0: \mu = \mu_0\)

Test statistic \[ t(\mu_0) = \frac{\overline{Y} - \mu_0}{\sqrt{s^2/n}} \]

Reference distribution \(t(\mu_0) \dot \sim t_{(n-1)}\)

t-test: Summary

Rejection Region for level \(\alpha\) test

One sided \(H_A: \mu < \mu_0\) Two sided \(H_A: \mu \ne \mu_0\) One sided \(H_A: \mu > \mu_0\)
\(t(\mu_0) < t_{(n-1)\alpha}\) \(|t(\mu_0)| > t_{(n-1)1-\alpha/2}\) \(t(\mu_0) > t_{(n-1)1 - \alpha}\)

\(t_{(n-1) \alpha}\) = qt(alpha, df = n - 1)

t-test: Summary

p-values given an observed \(t(\mu_0) = t\)

One sided \(H_A: \mu < \mu_0\) Two sided \(H_A: \mu \ne \mu_0\) One sided \(H_A: \mu > \mu_0\)
\(F_t(t; n-1)\) \(2\left(1 - F_t(|t|; n-1)\right)\) \(1 - F_t(t; n-1)\)

\(F_t(t; n-1)\) = pt(t, df = n-1)

Confidence Intervals \((1-\alpha)100\%\)

\[ \left( \overline{Y} - t_{(n-1)1 - \alpha/2} \, \sqrt{\frac{s^2}{n}}, \, \overline{Y} + t_{(n-1)1 - \alpha/2} \, \sqrt{\frac{s^2}{n}} \right) \]

Standard error

\[ Var(\overline{Y}) = \frac{\sigma^2 }{n} \] We estimated \(\sigma^2\) with \(s^2\), hence estimate \(Var(\overline{Y})\) with

\[ \widehat{Var}(\overline{Y}) = \frac{s^2}{n} \]

Square root of this, often called, standard error of the mean:

\[ \text{SE}(\overline{Y}) = \frac{s}{\sqrt{n}} \]

In general standard error refers to the estimated standard deviation of an estimator.

Next time

Population proportions (a special case of population means)