Homeworks
Where to look for comments.
Your Turn: Find all the problems in this statistical summary
(These will all be point deductions!)
“There is strong evidence the variance of US males is 9 (t-test of variance, p-value = 0.27191099). It is estimated the variance of the height of US males is 9.467606. With 95% confidence the variance in height of US males is between 8.633154 and 10.30206.”
Two extensions to the \(2 \times 2\) contingency tables
More than two categories: Chi-square test
More than two tables: Mantel-Haenszel Test
More than two categories
More than two categories
We might consider cross classifying our sample of \(N\) units on two variables that have more than two categories.
Is eating breakfast associated with your commute method? \(5 \times 2\) table
\(Y_i\) = {Ate breakfast, Didn’t eat breakfast },
\(G_i\) = {Walk, Bike, Drove alone, Drove with others, Other }Is your favorite sport associated with your favorite ice cream flavor? \(3 \times 5\) table
\(Y_i\) = {Baseball, Basketball, Football, Soccer, Hockey },
\(G_i\) = {Chocolate, Strawberry, Vanilla }
Chi-square test for \((r \times c)\) tables
Same as in \(2 \times 2\) case, we can do a Chi-square test.
\(H_0:\) No association between Variable 1 and Variable 2
\(O_{ij}\): observed count in row \(i\), column \(j\)
\(E_{ij}\): expected count in row \(i\), column \(j\) \[ E_{ij} = \frac{R_iC_j}{N} \]
\[ X = \sum_{i = 1}^{r} \sum_{j = 1}^{c} \frac{\left(O_{ij} - E_{ij}\right)^2}{E_{ij}} \]
Under null hypothesis \(X \dot \sim \chi^2_{(r - 1)\times(c-1)}\). Reject for large X.
Example
“Table 2.5, from the 2000 General Social Survey, cross classifies gender and political party identification. Subjects indicated whether they identified more strongly with the Democratic or Republican party or as Independents.”
(Agresti 2007)
Democrat | Independent | Republican | Sum | |
---|---|---|---|---|
F | 762 | 327 | 468 | 1557 |
M | 484 | 239 | 477 | 1200 |
Sum | 1246 | 566 | 945 | 2757 |
Example: Expected counts
Democrat | Independent | Republican | Sum | |
---|---|---|---|---|
F | 703.7 | 319.6 | 533.7 | 1557 |
M | 542.3 | 246.4 | 411.3 | 1200 |
Sum | 1246 | 566 | 945 | 2757 |
Example: Cell contributions
Democrat | Independent | Republican | |
---|---|---|---|
F | 4.835 | 0.1692 | 8.084 |
M | 6.273 | 0.2196 | 10.49 |
## X-squared
## 30.07015
Chi-squared test comments
Reference distribution is asymptotically exact.
Like \(2\times 2\) case, general rule of thumb: \(E_{ij} >5\) for all \(i,j\).
More than two tables
Your Turn
Is party preference associated with level of education?
Find the sample odds ratio for these two states?
education | democrat | rebublican |
---|---|---|
college | 3 | 27 |
no college | 7 | 63 |
education | democrat | rebublican |
---|---|---|
college | 63 | 7 |
no college | 27 | 3 |
Your turn
Now combine two tables and find the odds ratio.
education | democrat | rebublican |
---|---|---|
college | 66 | 34 |
no college | 34 | 66 |
Simpson’s paradox
“…in which a trend appears in different groups of data but disappears or reverses when these groups are combined.”
https://en.wikipedia.org/wiki/Simpson%27s_paradox
The Mantel-Haenszel procedure attempts to avoid the paradox by combining the individual odds ratios (rather than collapsing the tables and computing a single odds ratio)
Mantel-Haenszel odds ratio
\(k\) tables, indexed by \(j = 1, \ldots, k\).
Individual table odds ratio estimates: \[ \hat{\omega}_j = \frac{a_j d_j}{b_j c_j} \]
Combine in a weighted average: \[ \hat{\omega}_{MH} = \sum_{j= 1}^k \text{weight}^*_j \times \hat{\omega}_j \] where \[ \text{weight}^*_j = \frac{\text{weight}_j}{\sum{\text{weight}_j}} \quad \text{ and } \text{weight}_j = \frac{b_jc_j}{N_j} \]
Your Turn
Find \(\hat{\omega}_{MH}\) for the two tables:
education | democrat | rebublican |
---|---|---|
college | 3 | 27 |
no college | 7 | 63 |
education | democrat | rebublican |
---|---|---|
college | 63 | 7 |
no college | 27 | 3 |
Mantel-Haenszel test
\(H_0: \omega_j = 1\) for all \(j = 1, \ldots, k\)
\[ X = \frac{\left(\sum_{j = 1}^{k}( a_j - E(a_j)) \right)^2}{\sum_{j = 1}^{k} Var(a_j)} \]
\[ E(a_j) = \frac{(R_{1j})(C_{1j})}{N_j} \]
\[ Var(a_j) = \frac{R_{1j}C_{1j}R_{2j}C_{2j}}{N_j^2(N_j - 1)} \]
Under the null hypothesis \(X \dot \sim \chi^2_1\). Reject \(H_0\) for large values of \(X\).
In R
mantelhaen.test(df$education, df$party, z = df$state)
##
## Mantel-Haenszel chi-squared test without
## continuity correction
##
## data: df$education and df$party and df$state
## Mantel-Haenszel X-squared = 0, df = 1,
## p-value = 1
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3649129 2.7403803
## sample estimates:
## common odds ratio
## 1
Mantel-Haenszel Cautions
The test assumes the odds ratio is the same in all \(k\) tables.
- If this assumption is not met, it’s difficult to interpret the p-value, and it doesn’t make sense to estimate a common odds ratio.
- The test may fail to reject the null if the odds ratios are different from 1 but in opposite directions.