Two sample: Binary Response

Setting: two independent samples

$Y_{1}, \dots, Y_{n}$ i.i.d from Bernoulli $(p_{Y})$
$X_{1}, \dots, X_{m}$ i.i.d from Bernoulli $(p_{X})$

Parameter: Difference in population proportions $p_{Y} - p_{X}$

$p_{Y} = E (Y_{i}) = P (Y_{i} = 1)$
$p_{X} = E (X_{i}) = P (X_{i} = 1)$

As a contingency table

Represent resulting data in a 2 x 2 contingency table:

	0	1	Total
$Y_{i}$	a	b	n = a+b
$X_{i}$	c	d	m = c+d
Total	a+c	b+d	m + n

Two sample: Binary Response - Alternate view

Setting: two independent samples

$\begin{aligned} (Y_{1}, G_{1}), (Y_{2}, G_{2}), \dots, (Y_{n}, G_{n}), (Y_{n + 1}, G_{n + 1}), \dots, (Y_{n + m}, G_{n + m}) \end{aligned}$

where $G$ is a binary grouping variable which indicates which population the observation came from: $G_{i} = {\begin{cases} 0, & observation from Y \\ 1, & observation from X \end{cases}$

As a contingency table - Alternate view

Represent resulting data in a 2 x 2 contingency table:

	$Y_{i} = 0$	$Y_{i} = 1$	Total
$G_{i} = 0$	$n_{11} = a$	$n_{12} = b$	n = a+b = $R_{1}$
$G_{i} = 1$	$n_{21} = c$	$n_{22} = d$	m = c+d = $R_{2}$
Total	a+c = $C_{1}$	b+d= $C_{2}$	a + b + c + d = N

Two views are equivalent

If we are interested in the response variable given the group.

I sample 40 OSU graduate students and 20 OSU undergraduate students:
- $Y_{i}$ = graduate student, did you vote in 2016? $i = 1, \dots, 40$
- $X_{i}$ = undergraduate student did you vote in 2016? $i = 1, \dots, 20$
I sample 60 OSU students and record:
- $Y_{i}$ = did you vote in 2016?, $i = 1, \dots, 60$
- $G_{i}$ = student’s level (0 = graduate, 1 = undergraduate), $i = 1, \dots, 60$

Inference focuses on:

Comparing $P (Y_{i} = 1)$ and $P (X_{i} = 1)$ - first view
Comparing $P (Y_{i} = 1 | G_{i} = 0)$ and $P (Y_{i} = 1 | G_{i} = 1)$ - second view

Ways to compare two proportions

$Y_{1}, \dots, Y_{n}$ i.i.d from Bernoulli $(p_{Y})$
$X_{1}, \dots, X_{m}$ i.i.d from Bernoulli $(p_{X})$

Typical null hypothesis: $H_{0} : p_{Y} = p_{X}$ )

Difference in population proportions: $p_{Y} - p_{X}$

$H_{0} : p_{Y} - p_{X} = 0$

Relative risk: $p_{Y} / p_{X}$

$H_{0} : p_{Y} / p_{X} = 1$

Odds ratio: $\frac{p_{Y}}{1 - p_{Y}} / \frac{p_{X}}{1 - p_{X}}$

$H_{0} : \frac{p_{Y}}{1 - p_{Y}} / \frac{p_{X}}{1 - p_{X}} = 1$

Example

(From class_data, assuming you are like some random sample from a larger population)

$G_{i}$ : Do you prefer cats or dogs? $Y_{i}$ : Did you east breakfast this morning?

##        ate_breakfast
## cat_dog no yes
##    cats  6   9
##    dogs  6  14

Your turn: Fill in the table margins

Estimates

Probability of eating breakfast, given you prefer cats: $p_{Y} = P (Y_{i} = 1 | G_{i} = 0) = \frac{P (Y_{i} = 1 & G_{i} = 0)}{P (G_{i} = 0)}$ Estimate ${\hat{p}}_{Y} = \frac{b / N}{R_{1} / N} = \frac{9}{15} = 0.6$

Probability of eating breakfast, given you prefer dogs: $p_{X} = P (Y_{i} = 1 | G_{i} = 1) = \frac{P (Y_{i} = 1 & G_{i} = 1)}{P (G_{i} = 1)}$

Estimate ${\hat{p}}_{X} = \frac{d / N}{R_{2} / N} = \frac{14}{20} = 0.7$

Estimates

Difference in proportions ${\hat{p}}_{Y} - {\hat{p}}_{X} = 0.6 - 0.7 = - 0.1$

Relative Risk $\frac{{\hat{p}}_{Y}}{{\hat{p}}_{X}} = \frac{0.6}{0.7} = 0.86$

Odds Ratio $\frac{{\hat{p}}_{Y}}{1 - {\hat{p}}_{Y}} / \frac{{\hat{p}}_{X}}{1 - {\hat{p}}_{X}} = \frac{0.6}{1 - 0.6} / \frac{0.7}{1 - 0.7} = 0.64 = \frac{b c}{a d}$

Two sample Z-test of proportions

(Comes from considering proportion as mean and looking at two sample Z-test)

Null hypothesis: $H_{0} : p_{Y} = p_{X}$

$Z = \frac{{\hat{p}}_{Y} - {\hat{p}}_{X}}{\sqrt{{\hat{p}}_{c} (1 - {\hat{p}}_{c}) (\frac{1}{n} + \frac{1}{m})}}$

where $p_{c} = \frac{(n p_{Y} + m p_{X})}{n + m} = \frac{b + d}{N}$

When null is true $Z$ has a N(0, 1) distribution.

Confidence interval for difference in proportions

$(1 - α) 100 %$ CI: ${\hat{p}}_{Y} - {\hat{p}}_{X} \pm z_{1 - α / 2} \sqrt{\frac{{\hat{p}}_{Y} (1 - {\hat{p}}_{Y})}{n} + \frac{{\hat{p}}_{X} (1 - {\hat{p}}_{X})}{m}}$

Like in one sample case, binomial test and CI may not agree because they use different estimates of the variance of the difference in sample proportions.

Your Turn

$p_{c} = \frac{(n p_{Y} + m p_{X})}{n + m} = \frac{b + d}{N}$

What is $p_{c}$ for our table?

##        ate_breakfast
## cat_dog no yes Sum
##    cats  6   9  15
##    dogs  6  14  20
##    Sum  12  23  35

Example: Z-stat

$Z = \frac{{\hat{p}}_{Y} - {\hat{p}}_{X}}{\sqrt{{\hat{p}}_{c} (1 - {\hat{p}}_{c}) (\frac{1}{n} + \frac{1}{m})}} = \frac{- 0.1}{\sqrt{0.66 (1 - 0.66) (\frac{1}{15} + \frac{1}{20})}} = - 0.62$

Compare to $z_{1 - α / 2} = 1.96$

p-value (for two sided alternative) =0.54

95% confidence interval: $\begin{aligned} {\hat{p}}_{Y} - {\hat{p}}_{X} \pm z_{1 - α / 2} \sqrt{\frac{{\hat{p}}_{Y} (1 - {\hat{p}}_{Y})}{n} + \frac{{\hat{p}}_{X} (1 - {\hat{p}}_{X})}{m}} \\ = - 0.1 \pm \sqrt{\frac{0.6 (1 - 0.6)}{15} + \frac{0.7 (1 - 0.7)}{20}} \\ = (- 0.26, 0.06) \end{aligned}$

Pearson’s Chi-squared Test

$H_{0} : p_{Y} - p_{X} = 0$

$X = \sum_{j, k = 1, 2} \frac{(O_{j k} - E_{j k})^{2}}{E_{j k}}$

$O_{j k} = n_{j k}$

$E_{j k} = \frac{R_{j} C_{k}}{N}$

If null is true, $X$ has $χ_{1}^{2}$ distribution

Example: Chi-squared test

##        ate_breakfast
## cat_dog no yes Sum
##    cats  6   9  15
##    dogs  6  14  20
##    Sum  12  23  35

##        no   yes
## cats 5.14  9.86
## dogs 6.86 13.14

E.g $\frac{15 \times 12}{35} = 5.14$

Example: Chi-squared test

$\begin{aligned} X & = \frac{(6 - 5.14)^{2}}{5.14} + \frac{(9 - 9.86)^{2}}{9.86} + \frac{(6 - 6.86)^{2}}{6.86} + \frac{(14 - 13.14)^{2}}{13.14} \\ = 0.38 \end{aligned}$

Compare to $χ_{1}^{2} (1 - α) = 3.84$

p-value: =0.54

Summary

Pearson’s Chi-squared test for homogeneity of proportions across groups is equivalent (i.e. results in the same p-value) to the Z-test for proportions (when there are two groups).

$X = Z^{2}$

Two sample: Binary Response ST551 Lecture 21

Two sample: Binary Response

As a contingency table

Two sample: Binary Response - Alternate view

As a contingency table - Alternate view

Two views are equivalent

Ways to compare two proportions

Example

Estimates

Estimates

Two sample Z-test of proportions

Confidence interval for difference in proportions

Your Turn

Example: Z-stat

Pearson’s Chi-squared Test

Example: Chi-squared test

Example: Chi-squared test

Summary