Two sample: Binary Response
Setting: two independent samples
i.i.d from Bernoulli
i.i.d from Bernoulli
Parameter: Difference in population proportions
As a contingency table
Represent resulting data in a 2 x 2 contingency table:
0 | 1 | Total | |
---|---|---|---|
a | b | n = a+b | |
c | d | m = c+d | |
Total | a+c | b+d | m + n |
Two sample: Binary Response - Alternate view
Setting: two independent samples
where is a binary grouping variable which indicates which population the observation came from:
As a contingency table - Alternate view
Represent resulting data in a 2 x 2 contingency table:
Total | |||
---|---|---|---|
n = a+b = | |||
m = c+d = | |||
Total | a+c = | b+d= | a + b + c + d = N |
Two views are equivalent
If we are interested in the response variable given the group.
I sample 40 OSU graduate students and 20 OSU undergraduate students:
- = graduate student, did you vote in 2016?
- = undergraduate student did you vote in 2016?
I sample 60 OSU students and record:
- = did you vote in 2016?,
- = student’s level (0 = graduate, 1 = undergraduate),
Inference focuses on:
Comparing and - first view
Comparing and - second view
Ways to compare two proportions
i.i.d from Bernoulli
i.i.d from Bernoulli
Typical null hypothesis: )
Difference in population proportions:
Relative risk:
Odds ratio:
Example
(From class_data
, assuming you are like some random sample from a larger population)
: Do you prefer cats or dogs? : Did you east breakfast this morning?
## ate_breakfast
## cat_dog no yes
## cats 6 9
## dogs 6 14
Your turn: Fill in the table margins
Estimates
Probability of eating breakfast, given you prefer cats: Estimate
Probability of eating breakfast, given you prefer dogs:
Estimate
Estimates
Difference in proportions
Relative Risk
Odds Ratio
Two sample Z-test of proportions
(Comes from considering proportion as mean and looking at two sample Z-test)
Null hypothesis:
where
When null is true has a N(0, 1) distribution.
Confidence interval for difference in proportions
CI:
Like in one sample case, binomial test and CI may not agree because they use different estimates of the variance of the difference in sample proportions.
Your Turn
What is for our table?
## ate_breakfast
## cat_dog no yes Sum
## cats 6 9 15
## dogs 6 14 20
## Sum 12 23 35
Example: Z-stat
Compare to
p-value (for two sided alternative) =0.54
95% confidence interval:
Pearson’s Chi-squared Test
If null is true, has distribution
Example: Chi-squared test
## ate_breakfast
## cat_dog no yes Sum
## cats 6 9 15
## dogs 6 14 20
## Sum 12 23 35
## no yes
## cats 5.14 9.86
## dogs 6.86 13.14
E.g
Example: Chi-squared test
Compare to
p-value: =0.54
Summary
Pearson’s Chi-squared test for homogeneity of proportions across groups is equivalent (i.e. results in the same p-value) to the Z-test for proportions (when there are two groups).