More complicated tables ST551 Lecture 24

Homeworks

Where to look for comments.

Your Turn: Find all the problems in this statistical summary

(These will all be point deductions!)

“There is strong evidence the variance of US males is 9 (t-test of variance, p-value = 0.27191099). It is estimated the variance of the height of US males is 9.467606. With 95% confidence the variance in height of US males is between 8.633154 and 10.30206.”

Two extensions to the \(2 \times 2\) contingency tables

  • More than two categories: Chi-square test

  • More than two tables: Mantel-Haenszel Test

More than two categories

More than two categories

We might consider cross classifying our sample of \(N\) units on two variables that have more than two categories.

  • Is eating breakfast associated with your commute method? \(5 \times 2\) table

    \(Y_i\) = {Ate breakfast, Didn’t eat breakfast },
    \(G_i\) = {Walk, Bike, Drove alone, Drove with others, Other }

  • Is your favorite sport associated with your favorite ice cream flavor? \(3 \times 5\) table

    \(Y_i\) = {Baseball, Basketball, Football, Soccer, Hockey },
    \(G_i\) = {Chocolate, Strawberry, Vanilla }

Chi-square test for \((r \times c)\) tables

Same as in \(2 \times 2\) case, we can do a Chi-square test.

\(H_0:\) No association between Variable 1 and Variable 2

\(O_{ij}\): observed count in row \(i\), column \(j\)

\(E_{ij}\): expected count in row \(i\), column \(j\) \[ E_{ij} = \frac{R_iC_j}{N} \]

\[ X = \sum_{i = 1}^{r} \sum_{j = 1}^{c} \frac{\left(O_{ij} - E_{ij}\right)^2}{E_{ij}} \]

Under null hypothesis \(X \dot \sim \chi^2_{(r - 1)\times(c-1)}\). Reject for large X.

Example

“Table 2.5, from the 2000 General Social Survey, cross classifies gender and political party identification. Subjects indicated whether they identified more strongly with the Democratic or Republican party or as Independents.”

(Agresti 2007)

  Democrat Independent Republican Sum
F 762 327 468 1557
M 484 239 477 1200
Sum 1246 566 945 2757

Example: Expected counts

  Democrat Independent Republican Sum
F 703.7 319.6 533.7 1557
M 542.3 246.4 411.3 1200
Sum 1246 566 945 2757

Example: Cell contributions

  Democrat Independent Republican
F 4.835 0.1692 8.084
M 6.273 0.2196 10.49
## X-squared 
##  30.07015

Chi-squared test comments

Reference distribution is asymptotically exact.

Like \(2\times 2\) case, general rule of thumb: \(E_{ij} >5\) for all \(i,j\).

More than two tables

Your Turn

Is party preference associated with level of education?

Find the sample odds ratio for these two states?

State 1
education democrat rebublican
college 3 27
no college 7 63
State 2
education democrat rebublican
college 63 7
no college 27 3

Your turn

Now combine two tables and find the odds ratio.

Combined
education democrat rebublican
college 66 34
no college 34 66

Simpson’s paradox

“…in which a trend appears in different groups of data but disappears or reverses when these groups are combined.”

https://en.wikipedia.org/wiki/Simpson%27s_paradox

The Mantel-Haenszel procedure attempts to avoid the paradox by combining the individual odds ratios (rather than collapsing the tables and computing a single odds ratio)

Mantel-Haenszel odds ratio

\(k\) tables, indexed by \(j = 1, \ldots, k\).

Individual table odds ratio estimates: \[ \hat{\omega}_j = \frac{a_j d_j}{b_j c_j} \]

Combine in a weighted average: \[ \hat{\omega}_{MH} = \sum_{j= 1}^k \text{weight}^*_j \times \hat{\omega}_j \] where \[ \text{weight}^*_j = \frac{\text{weight}_j}{\sum{\text{weight}_j}} \quad \text{ and } \text{weight}_j = \frac{b_jc_j}{N_j} \]

Your Turn

Find \(\hat{\omega}_{MH}\) for the two tables:

State 1
education democrat rebublican
college 3 27
no college 7 63
State 2
education democrat rebublican
college 63 7
no college 27 3

Mantel-Haenszel test

\(H_0: \omega_j = 1\) for all \(j = 1, \ldots, k\)

\[ X = \frac{\left(\sum_{j = 1}^{k}( a_j - E(a_j)) \right)^2}{\sum_{j = 1}^{k} Var(a_j)} \]

\[ E(a_j) = \frac{(R_{1j})(C_{1j})}{N_j} \]

\[ Var(a_j) = \frac{R_{1j}C_{1j}R_{2j}C_{2j}}{N_j^2(N_j - 1)} \]

Under the null hypothesis \(X \dot \sim \chi^2_1\). Reject \(H_0\) for large values of \(X\).

In R

mantelhaen.test(df$education, df$party, z = df$state)
## 
##  Mantel-Haenszel chi-squared test without
##  continuity correction
## 
## data:  df$education and df$party and df$state
## Mantel-Haenszel X-squared = 0, df = 1,
## p-value = 1
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3649129 2.7403803
## sample estimates:
## common odds ratio 
##                 1

Mantel-Haenszel Cautions

The test assumes the odds ratio is the same in all \(k\) tables.

  • If this assumption is not met, it’s difficult to interpret the p-value, and it doesn’t make sense to estimate a common odds ratio.
  • The test may fail to reject the null if the odds ratios are different from 1 but in opposite directions.