Two sample inference ST551 Lecture 18

Finish last time’s slides

Two sample inference

Two sample setting

Setting: two independent samples

Y1,,Yn i.i.d from population with c.d.f FY, and
X1,,Xm i.i.d from population with c.d.f FX

Parameter: now focus on some comparison between the two populations FY and FX

Alternative view

Setting: two independent samples

(Y1,G1),(Y2,G2),,(Yn,Gn),(Yn+1,Gn+1),,(Yn+m,Gn+m)

where G is a binary grouping variable which indicates which population the observation came from: Gi={0,observation from Y1,observation from X

Two views are equivalent

Depending on sampling scheme one view may seem more natural:

  • I sample 40 OSU graduate students and 20 OSU undergraduate students:

    • Yi = graduate student time to complete 1 mile run, i=1,,40
    • Xi = undergraduate student time to complete 1 mile run, i=1,,20
  • I sample 60 OSU students and record:

    • Yi = time to complete 1 mile run, i=1,,60
    • Gi = student’s level (0 = graduate, 1 = undergraduate), i=1,,60

In second view, if we condition on the counts in each group, inference is the same as first view.

Two sample inference for difference in population means

To compare population means: μY=E(Yi), μX=E(Xi), we might look at their difference:

δ=μYμX

(In alternative view: equivalent to δ=E(Yi|Gi=0)E(Yi|Gi=1))

  • Estimate for δ
  • Test for H0:δ=δ0
  • Confidence interval for δ

Difference in sample means

It seems reasonable to use:

δ^=Y¯X¯ as a good starting point for inference on δ=μXμY.

Complete worksheet (Charlotte will provide)

Leads to two sample Z-test and intervals

Assume known population variances: Var(Yi)=σY2 Var(Xi)=σX2.

Z(δ0)=(Y¯X¯)δ0σY2/n+σX2/m

Reference Distribution: If null hypothesis H0:δ=δ0 is true, then Z(δ0)˙N(0,1)

Rejection Regions:

  • HA:δ>δ0, reject H0 for Z(δ0)>z1α
  • HA:δ<δ0, reject H0 for Z(δ0)<zα
  • HA:δδ0, reject H0 for |Z(δ0)|>z1α/2

Leads to two sample Z-test and intervals

(1α)100% Confidence interval for δ=μYμX

(Y¯X¯)±z1α/2σY2n+σX2m

Next time…

What if population variances aren’t known?