Two sample: Binary Response ST551 Lecture 21

Two sample: Binary Response

Setting: two independent samples

Y1,,Yn i.i.d from Bernoulli(pY)
X1,,Xm i.i.d from Bernoulli(pX)

Parameter: Difference in population proportions pYpX

pY=E(Yi)=P(Yi=1)
pX=E(Xi)=P(Xi=1)

As a contingency table

Represent resulting data in a 2 x 2 contingency table:

0 1 Total
Yi a b n = a+b
Xi c d m = c+d
Total a+c b+d m + n

Two sample: Binary Response - Alternate view

Setting: two independent samples

(Y1,G1),(Y2,G2),,(Yn,Gn),(Yn+1,Gn+1),,(Yn+m,Gn+m)

where G is a binary grouping variable which indicates which population the observation came from: Gi={0,observation from Y1,observation from X

As a contingency table - Alternate view

Represent resulting data in a 2 x 2 contingency table:

Yi=0 Yi=1 Total
Gi=0 n11=a n12=b n = a+b = R1
Gi=1 n21=c n22=d m = c+d = R2
Total a+c = C1 b+d=C2 a + b + c + d = N

Two views are equivalent

If we are interested in the response variable given the group.

  • I sample 40 OSU graduate students and 20 OSU undergraduate students:

    • Yi = graduate student, did you vote in 2016? i=1,,40
    • Xi = undergraduate student did you vote in 2016? i=1,,20
  • I sample 60 OSU students and record:

    • Yi = did you vote in 2016?, i=1,,60
    • Gi = student’s level (0 = graduate, 1 = undergraduate), i=1,,60

Inference focuses on:

Comparing P(Yi=1) and P(Xi=1) - first view
Comparing P(Yi=1|Gi=0) and P(Yi=1|Gi=1) - second view

Ways to compare two proportions

Y1,,Yn i.i.d from Bernoulli(pY)
X1,,Xm i.i.d from Bernoulli(pX)

Typical null hypothesis: H0:pY=pX)

Difference in population proportions: pYpX

  • H0:pYpX=0

Relative risk: pY/pX

  • H0:pY/pX=1

Odds ratio: pY1pY/pX1pX

  • H0:pY1pY/pX1pX=1

Example

(From class_data, assuming you are like some random sample from a larger population)

Gi: Do you prefer cats or dogs? Yi: Did you east breakfast this morning?

##        ate_breakfast
## cat_dog no yes
##    cats  6   9
##    dogs  6  14

Your turn: Fill in the table margins

Estimates

Probability of eating breakfast, given you prefer cats: pY=P(Yi=1|Gi=0)=P(Yi=1&Gi=0)P(Gi=0) Estimate p^Y=b/NR1/N=915=0.6

Probability of eating breakfast, given you prefer dogs: pX=P(Yi=1|Gi=1)=P(Yi=1&Gi=1)P(Gi=1)

Estimate p^X=d/NR2/N=1420=0.7

Estimates

Difference in proportions p^Yp^X=0.60.7=0.1

Relative Risk p^Yp^X=0.60.7=0.86

Odds Ratio p^Y1p^Y/p^X1p^X=0.610.6/0.710.7=0.64=bcad

Two sample Z-test of proportions

(Comes from considering proportion as mean and looking at two sample Z-test)

Null hypothesis: H0:pY=pX

Z=p^Yp^Xp^c(1p^c)(1n+1m)

where pc=(npY+mpX)n+m=b+dN

When null is true Z has a N(0, 1) distribution.

Confidence interval for difference in proportions

(1α)100% CI: p^Yp^X±z1α/2p^Y(1p^Y)n+p^X(1p^X)m

Like in one sample case, binomial test and CI may not agree because they use different estimates of the variance of the difference in sample proportions.

Your Turn

pc=(npY+mpX)n+m=b+dN

What is pc for our table?

##        ate_breakfast
## cat_dog no yes Sum
##    cats  6   9  15
##    dogs  6  14  20
##    Sum  12  23  35

Example: Z-stat

Z=p^Yp^Xp^c(1p^c)(1n+1m)=0.10.66(10.66)(115+120)=0.62

Compare to z1α/2=1.96

p-value (for two sided alternative) =0.54

95% confidence interval: p^Yp^X±z1α/2p^Y(1p^Y)n+p^X(1p^X)m=0.1±0.6(10.6)15+0.7(10.7)20=(0.26,0.06)

Pearson’s Chi-squared Test

H0:pYpX=0

X=j,k=1,2(OjkEjk)2Ejk

Ojk=njk

Ejk=RjCkN

If null is true, X has χ12 distribution

Example: Chi-squared test

##        ate_breakfast
## cat_dog no yes Sum
##    cats  6   9  15
##    dogs  6  14  20
##    Sum  12  23  35
##        no   yes
## cats 5.14  9.86
## dogs 6.86 13.14

E.g 15×1235=5.14

Example: Chi-squared test

X=(65.14)25.14+(99.86)29.86+(66.86)26.86+(1413.14)213.14=0.38

Compare to χ12(1α)=3.84

p-value: =0.54

Summary

Pearson’s Chi-squared test for homogeneity of proportions across groups is equivalent (i.e. results in the same p-value) to the Z-test for proportions (when there are two groups).

X=Z2