Review of last week’s t-tests
Setting
Setting: two independent samples
\(Y_1, \ldots, Y_n\) i.i.d from population with c.d.f \(F_Y\), and
\(X_1, \ldots, X_m\) i.i.d from population with c.d.f \(F_X\)
Parameter: Difference in population means \(\mu_Y - \mu_X\)
Equal variance two sample t-test
Assume \(\sigma_X^2 = \sigma_Y^2\).
\[ t(\delta_0) = \frac{(\overline{Y} - \overline{X}) - \delta_0}{\sqrt{s_p^2 \left(\frac{1}{n} + \frac{1}{m}\right)}} \]
Compare to \(t_{(n+m-2)}\).
Welch’s t-test
\(\sigma_X^2\) not necessarily equal to \(\sigma_Y^2\).
\[ t(\delta_0) = \frac{(\overline{Y} - \overline{X}) - \delta_0}{\sqrt{\frac{s_Y^2}{n} + \frac{s^2_X}{m}}} \]
Compare to \(t_{v}\), where
\[ v = \frac{(s_Y^2/n + s_X^2/m)^2}{\frac{s_Y^4}{n^2(n-1)} + \frac{s_X^4}{m^2(m-1)} } \]
In both cases
If \(\textit{df}\) is the appropriate degrees of freedom for the test.
Rejection regions:
- \(H_A: \mu_Y - \mu_X > 0\): Reject \(H_0\) for \(t(\delta_0) > t_{(df), 1 - \alpha}\)
- \(H_A: \mu_Y - \mu_X < 0\): Reject \(H_0\) for \(t(\delta_0) < t_{(df), \alpha}\)
- \(H_A: \mu_Y - \mu_X \ne 0\): Reject \(H_0\) for \(|t(\delta_0)| > t_{(df), 1 - \alpha/2}\)
Confidence intervals:
\[ \overline{Y} - \overline{X} \pm t_{(df), 1-\alpha/2} \text{SE}_{\overline{Y} - \overline{X}} \]
Paired Data
Setting
Two dependent samples
\(Y_1, \ldots, Y_n\) i.i.d from population with c.d.f \(F_Y\), and
\(X_1, \ldots, X_n\) i.i.d from population with c.d.f \(F_X\)
Observations come in pairs: \[ (Y_1, X_1), (Y_2, X_2), \ldots, (Y_n, X_n) \]
with joint distribution \(F_{YX}\). Observations are somehow matched.
\(Cov(Y_i, X_i) = \sigma_{YX}\) and \(Cov(Y_i, X_j) = 0\) for all \(i \ne j\).
Parameter: Difference in population means \(\mu_Y - \mu_X\)
Examples
We’ve already seen examples like this:
- midterm Mother’s IQ (\(Y_i\)) and Father’s IQ (\(X_i\))
- homework Current weight (\(Y_i\)) and desired weight (\(X_i\))
We did one sample t-tests on the differences \(Y_i - X_i\). This works, but why?
Sampling Distirbution of difference in sample means
Consider \[ D_i = Y_i - X_i \]
CLT says \[ \frac{\overline{D} - E(D_i)}{\sqrt{Var(D_i)/n}} \, \dot \sim \, N(0, 1) \]
What are \(\overline{D}\), \(E(D_i)\) and \(Var(D_i)\)?
What are \(\overline{D}\), \(E(D_i)\) and \(Var(D_i)\)?
A Z-test for paired data
If \(\sigma^2_Y\), \(\sigma^2_X\), and \(\sigma_{YX}\) are known.
Hypothesis Test for \(H_0: \delta = \delta_0\)
Test Statistic: \[ Z(\delta_0) = \frac{(\overline{Y} - \overline{X}) - \delta_0}{\sqrt{\frac{\sigma^2_Y}{n} + \frac{\sigma^2_X}{n} - 2 \frac{\sigma_{YX}}{n}}} \]
Reference Distribution: Under \(H_0\), \(Z(\delta_0) \dot \sim N(0, 1)\)
Rejection Region:
- \(H_A: \delta > \delta_0\): Reject \(H_0\) for \(z(\delta_0) > z_{1-\alpha}\)
- \(H_A: \delta < \delta_0\): Reject \(H_0\) for \(z(\delta_0) < z_{\alpha}\)
- \(H_A: \delta \ne \delta_0\): Reject \(H_0\) for \(|z(\delta_0)| > z_{1-\alpha/2}\)
A Z-test for paired data
\[ Z(\delta_0) = \frac{(\overline{Y} - \overline{X}) - \delta_0}{\sqrt{\frac{\sigma^2_Y}{n} + \frac{\sigma^2_X}{n} - 2 \frac{\sigma_{YX}}{n}}} \]
Notice the test statistic is just like a two sample Z-test, but with a correction to \(Var(\overline{Y} - \overline{X})\) for the correlation between \(Y_i\) and \(X_i\).
What if population variances and covariances aren’t known?
Plug in estimates for \(\sigma_Y^2\), \(\sigma_X^2\) and \(\sigma_YX\).
Sample covariance: \[ \hat{\sigma}_{YX} = s_{YX} = \frac{1}{n-1} \sum_{i=1}^{n} (Y_i - \overline{Y}) (X_i - \overline{X}) \] is an unbiased estimate of \(\sigma_{YX}\).
Plugging in the estimates gives the estimated variance of \(\overline{Y} - \overline{X}\): \[ \widehat{Var}\left(\overline{D}\right) = \frac{s^2_Y}{n} + \frac{s^2_X}{n} - 2 \frac{s_{YX}}{n} \]
Compare to estimated \(Var(D_i)\)
\(s_D^2 =\)
Paired data t-test
\[ t(\delta_0) = \frac{(\overline{Y} - \overline{X}) - \delta_0}{\sqrt{\frac{s^2_Y}{n} + \frac{s^2_X}{n} - 2 \frac{s_{YX}}{n}}} = \frac{\overline{D}- \delta_0}{\sqrt{\frac{s_D^2}{n}}} \]
If differences are Normal, \(t(\delta_0)\) has exactly a t-distribution with \(n-1\) degrees of freedom when the null hypothesis is true.
Summary
For paired samples:
- Take differences \(D_i = X_i - Y_i\)
- Perform a one-sample hypothesis test for the population mean difference \(\mu_D = \mu_Y - \mu_X\)
That is, do a one-sample t-test on the differences
This is equivalent to estimating the population covariance and appropriately adjusting the denominator of the two-sample t-test to take this covariance into account