Finish last time’s slides
Paired Binary Data
Paired Binary Data
Imagine now that our two samples of Bernoulli populations aren’t independent, but paired in some way.
\(Y_i, \ldots, Y_n \sim \text{Bernoulli}(p_Y)\)
\(X_i, \ldots, X_n \sim \text{Bernoulli}(p_X)\)
but \((Y_i, X_i)\) are paired.
Examples:
- \(n\) subjects with a disease, and \(n\) without a disease are sampled then matched (based on demographic factors), response is presence of some risk factor
- Sibling (or twin) studies: \(n\) pairs of related people where one falls in one group, and the other falls in the other group, observe some binary response on every person.
- Binary before and after measurements on the same person
Paired Binary Data
Gather sample of \(n = 40\) voters.
Before debate: Will you vote for candidate A?
After debate: Will you vote for candidate A?
subject | before | after |
---|---|---|
1 | 1 | 1 |
2 | 1 | 0 |
3 | 1 | 0 |
4 | 1 | 1 |
5 | 1 | 1 |
6 | 1 | 0 |
Just a \(2\times2\) table?
0 | 1 | |
---|---|---|
after | 21 | 19 |
before | 23 | 17 |
0 | 1 | |
---|---|---|
0 | 12 | 11 |
1 | 9 | 8 |
How to analyse?
Option 1: Treat like paired two sample data and do a paired t-test
Option 2: McNemar’s test
Paired t-test
Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)
Look at (per voter) differences:
subject | before | after | diff |
---|---|---|---|
1 | 1 | 1 | 0 |
2 | 1 | 0 | -1 |
3 | 1 | 0 | -1 |
4 | 1 | 1 | 0 |
5 | 1 | 1 | 0 |
6 | 1 | 0 | -1 |
-1 | 0 | 1 |
---|---|---|
9 | 20 | 11 |
Paired t-test calculations
\[ \overline{D} = \frac{1}{n}\left((-1\times 9) + (0 \times 20) + (1 \times 11)\right) = \frac{b - c}{n} = \frac{2}{40} = 0.05 \]
\[ \begin{aligned} s_D^2 &= \frac{1}{n-1}\left( 9(-1 - \overline{D})^2 + 20(0 - \overline{D})^2+ 11(1 - \overline{D})^2 \right) \\ &= \frac{1}{n-1} \left(c + b - \frac{(b-c)^2}{n} \right) \\ &= \frac{1}{40-1} \left(9 + 11 - \frac{(11-9)^2}{n} \right) \\ &= 0.51 \end{aligned} \]
Paired t-test calculations
##
## One Sample t-test
##
## data: df$diff
## t = 0.4427, df = 39, p-value = 0.6604
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.1784514 0.2784514
## sample estimates:
## mean of x
## 0.05
McNemar’s test
Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)
Conditions on the number of discordant pairs, \(b + c\).
0 | 1 | |
---|---|---|
0 | 12 | 11 |
1 | 9 | 8 |
Under Null hypothesis, we expect the number of discordant pairs (e.g. people who change their minds during debate) should be equally split between \(b\) and \(c\).
McNemar’s test
Conditional on \(b + c\), \[ b \sim \text{Binomial}(b+c, 0.5) \]
Do, one sample Z-test for proportions, leads to \[ Z = \frac{b - c}{\sqrt{b + c}} \dot \sim N(0, 1) \quad \text{under null hypothesis} \] (sometimes people square this statistic, and compare to \(\chi^2_1\))
Example: McNemar’s
0 | 1 | |
---|---|---|
0 | 12 | 11 |
1 | 9 | 8 |
\[ Z = \frac{b-c}{\sqrt{b+c}} = \frac{11 - 9}{\sqrt{11 + 9}} = 0.45 \]
Compare to N(0,1)
Final points
McNemar’s test is equivalent to the paired t-test, in the sense that the two test statistics are monotone transformations of each other.
For large sample sizes, the two test statistics get closer and closer to the same value: asymptotically equivalent.