# Finish last time’s slides

# Paired Binary Data

## Paired Binary Data

Imagine now that our two samples of Bernoulli populations aren’t independent, but paired in some way.

\(Y_i, \ldots, Y_n \sim \text{Bernoulli}(p_Y)\)

\(X_i, \ldots, X_n \sim \text{Bernoulli}(p_X)\)

but \((Y_i, X_i)\) are paired.

Examples:

- \(n\) subjects with a disease, and \(n\) without a disease are sampled then matched (based on demographic factors), response is presence of some risk factor
- Sibling (or twin) studies: \(n\) pairs of related people where one falls in one group, and the other falls in the other group, observe some binary response on every person.
- Binary before and after measurements on the same person

## Paired Binary Data

Gather sample of \(n = 40\) voters.

Before debate: Will you vote for candidate A?

After debate: Will you vote for candidate A?

subject | before | after |
---|---|---|

1 | 1 | 1 |

2 | 1 | 0 |

3 | 1 | 0 |

4 | 1 | 1 |

5 | 1 | 1 |

6 | 1 | 0 |

## Just a \(2\times2\) table?

0 | 1 | |
---|---|---|

after |
21 | 19 |

before |
23 | 17 |

0 | 1 | |
---|---|---|

0 |
12 | 11 |

1 |
9 | 8 |

## How to analyse?

**Option 1**: Treat like paired two sample data and do a paired t-test

**Option 2**: McNemar’s test

## Paired t-test

Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)

Look at (per voter) differences:

subject | before | after | diff |
---|---|---|---|

1 | 1 | 1 | 0 |

2 | 1 | 0 | -1 |

3 | 1 | 0 | -1 |

4 | 1 | 1 | 0 |

5 | 1 | 1 | 0 |

6 | 1 | 0 | -1 |

-1 | 0 | 1 |
---|---|---|

9 | 20 | 11 |

## Paired t-test calculations

\[ \overline{D} = \frac{1}{n}\left((-1\times 9) + (0 \times 20) + (1 \times 11)\right) = \frac{b - c}{n} = \frac{2}{40} = 0.05 \]

\[ \begin{aligned} s_D^2 &= \frac{1}{n-1}\left( 9(-1 - \overline{D})^2 + 20(0 - \overline{D})^2+ 11(1 - \overline{D})^2 \right) \\ &= \frac{1}{n-1} \left(c + b - \frac{(b-c)^2}{n} \right) \\ &= \frac{1}{40-1} \left(9 + 11 - \frac{(11-9)^2}{n} \right) \\ &= 0.51 \end{aligned} \]

## Paired t-test calculations

```
##
## One Sample t-test
##
## data: df$diff
## t = 0.4427, df = 39, p-value = 0.6604
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.1784514 0.2784514
## sample estimates:
## mean of x
## 0.05
```

## McNemar’s test

Null hypothesis: \(H_0: p_{\text{before}} = p_{\text{after}}\)

Conditions on the number of discordant pairs, \(b + c\).

0 | 1 | |
---|---|---|

0 |
12 | 11 |

1 |
9 | 8 |

Under Null hypothesis, we expect the number of discordant pairs (e.g. people who change their minds during debate) should be equally split between \(b\) and \(c\).

## McNemar’s test

Conditional on \(b + c\), \[ b \sim \text{Binomial}(b+c, 0.5) \]

Do, one sample Z-test for proportions, leads to \[ Z = \frac{b - c}{\sqrt{b + c}} \dot \sim N(0, 1) \quad \text{under null hypothesis} \] (sometimes people square this statistic, and compare to \(\chi^2_1\))

## Example: McNemar’s

0 | 1 | |
---|---|---|

0 |
12 | 11 |

1 |
9 | 8 |

\[ Z = \frac{b-c}{\sqrt{b+c}} = \frac{11 - 9}{\sqrt{11 + 9}} = 0.45 \]

Compare to N(0,1)

## Final points

McNemar’s test is equivalent to the paired t-test, in the sense that the two test statistics are monotone transformations of each other.

For large sample sizes, the two test statistics get closer and closer to the same value: asymptotically equivalent.