Paired Binary Data ST551 Lecture 23

Finish last time’s slides

Paired Binary Data

Paired Binary Data

Imagine now that our two samples of Bernoulli populations aren’t independent, but paired in some way.

\(Y_i, \ldots, Y_n \sim \text{Bernoulli}(p_Y)\)
\(X_i, \ldots, X_n \sim \text{Bernoulli}(p_X)\)

but \((Y_i, X_i)\) are paired.

Examples:

  • \(n\) subjects with a disease, and \(n\) without a disease are sampled then matched (based on demographic factors), response is presence of some risk factor
  • Sibling (or twin) studies: \(n\) pairs of related people where one falls in one group, and the other falls in the other group, observe some binary response on every person.
  • Binary before and after measurements on the same person

Paired Binary Data

Gather sample of \(n = 40\) voters.

Before debate: Will you vote for candidate A?
After debate: Will you vote for candidate A?

subject before after
1 1 1
2 1 0
3 1 0
4 1 1
5 1 1
6 1 0

Just a \(2\times2\) table?

  0 1
after 21 19
before 23 17
  0 1
0 12 11
1 9 8

How to analyse?

Option 1: Treat like paired two sample data and do a paired t-test

Option 2: McNemar’s test

Paired t-test

Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)

Look at (per voter) differences:

subject before after diff
1 1 1 0
2 1 0 -1
3 1 0 -1
4 1 1 0
5 1 1 0
6 1 0 -1
-1 0 1
9 20 11

Paired t-test calculations

\[ \overline{D} = \frac{1}{n}\left((-1\times 9) + (0 \times 20) + (1 \times 11)\right) = \frac{b - c}{n} = \frac{2}{40} = 0.05 \]

\[ \begin{aligned} s_D^2 &= \frac{1}{n-1}\left( 9(-1 - \overline{D})^2 + 20(0 - \overline{D})^2+ 11(1 - \overline{D})^2 \right) \\ &= \frac{1}{n-1} \left(c + b - \frac{(b-c)^2}{n} \right) \\ &= \frac{1}{40-1} \left(9 + 11 - \frac{(11-9)^2}{n} \right) \\ &= 0.51 \end{aligned} \]

Paired t-test calculations

## 
##  One Sample t-test
## 
## data:  df$diff
## t = 0.4427, df = 39, p-value = 0.6604
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.1784514  0.2784514
## sample estimates:
## mean of x 
##      0.05

McNemar’s test

Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)

Conditions on the number of discordant pairs, \(b + c\).

  0 1
0 12 11
1 9 8

Under Null hypothesis, we expect the number of discordant pairs (e.g. people who change their minds during debate) should be equally split between \(b\) and \(c\).

McNemar’s test

Conditional on \(b + c\), \[ b \sim \text{Binomial}(b+c, 0.5) \]

Do, one sample Z-test for proportions, leads to \[ Z = \frac{b - c}{\sqrt{b + c}} \dot \sim N(0, 1) \quad \text{under null hypothesis} \] (sometimes people square this statistic, and compare to \(\chi^2_1\))

Example: McNemar’s

  0 1
0 12 11
1 9 8

\[ Z = \frac{b-c}{\sqrt{b+c}} = \frac{11 - 9}{\sqrt{11 + 9}} = 0.45 \]

Compare to N(0,1)

Final points

  • McNemar’s test is equivalent to the paired t-test, in the sense that the two test statistics are monotone transformations of each other.

  • For large sample sizes, the two test statistics get closer and closer to the same value: asymptotically equivalent.