Paired Binary Data

Paired Binary Data

Imagine now that our two samples of Bernoulli populations aren’t independent, but paired in some way.

$$Y_i, \ldots, Y_n \sim \text{Bernoulli}(p_Y)$$
$$X_i, \ldots, X_n \sim \text{Bernoulli}(p_X)$$

but $$(Y_i, X_i)$$ are paired.

Examples:

• $$n$$ subjects with a disease, and $$n$$ without a disease are sampled then matched (based on demographic factors), response is presence of some risk factor
• Sibling (or twin) studies: $$n$$ pairs of related people where one falls in one group, and the other falls in the other group, observe some binary response on every person.
• Binary before and after measurements on the same person

Paired Binary Data

Gather sample of $$n = 40$$ voters.

Before debate: Will you vote for candidate A?
After debate: Will you vote for candidate A?

subject before after
1 1 1
2 1 0
3 1 0
4 1 1
5 1 1
6 1 0

0 1
after 21 19
before 23 17
0 1
0 12 11
1 9 8

How to analyse?

Option 1: Treat like paired two sample data and do a paired t-test

Option 2: McNemar’s test

Paired t-test

Null hypothesis: $$H_0: p_{\text{before}} = p_{\text{after}}$$

Look at (per voter) differences:

subject before after diff
1 1 1 0
2 1 0 -1
3 1 0 -1
4 1 1 0
5 1 1 0
6 1 0 -1
-1 0 1
9 20 11

Paired t-test calculations

$\overline{D} = \frac{1}{n}\left((-1\times 9) + (0 \times 20) + (1 \times 11)\right) = \frac{b - c}{n} = \frac{2}{40} = 0.05$

\begin{aligned} s_D^2 &= \frac{1}{n-1}\left( 9(-1 - \overline{D})^2 + 20(0 - \overline{D})^2+ 11(1 - \overline{D})^2 \right) \\ &= \frac{1}{n-1} \left(c + b - \frac{(b-c)^2}{n} \right) \\ &= \frac{1}{40-1} \left(9 + 11 - \frac{(11-9)^2}{n} \right) \\ &= 0.51 \end{aligned}

Paired t-test calculations

##
##  One Sample t-test
##
## data:  df\$diff
## t = 0.4427, df = 39, p-value = 0.6604
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.1784514  0.2784514
## sample estimates:
## mean of x
##      0.05

McNemar’s test

Null hypothesis: $$H_0: p_{\text{before}} = p_{\text{after}}$$

Conditions on the number of discordant pairs, $$b + c$$.

0 1
0 12 11
1 9 8

Under Null hypothesis, we expect the number of discordant pairs (e.g. people who change their minds during debate) should be equally split between $$b$$ and $$c$$.

McNemar’s test

Conditional on $$b + c$$, $b \sim \text{Binomial}(b+c, 0.5)$

Do, one sample Z-test for proportions, leads to $Z = \frac{b - c}{\sqrt{b + c}} \dot \sim N(0, 1) \quad \text{under null hypothesis}$ (sometimes people square this statistic, and compare to $$\chi^2_1$$)

Example: McNemar’s

0 1
0 12 11
1 9 8

$Z = \frac{b-c}{\sqrt{b+c}} = \frac{11 - 9}{\sqrt{11 + 9}} = 0.45$

Compare to N(0,1)

Final points

• McNemar’s test is equivalent to the paired t-test, in the sense that the two test statistics are monotone transformations of each other.

• For large sample sizes, the two test statistics get closer and closer to the same value: asymptotically equivalent.