Homeworks

Where to look for comments.

Your Turn: Find all the problems in this statistical summary

(These will all be point deductions!)

“There is strong evidence the variance of US males is 9 (t-test of variance, p-value = 0.27191099). It is estimated the variance of the height of US males is 9.467606. With 95% confidence the variance in height of US males is between 8.633154 and 10.30206.”

Two extensions to the \(2 \times 2\) contingency tables

More than two categories: Chi-square test
More than two tables: Mantel-Haenszel Test

More than two categories

We might consider cross classifying our sample of \(N\) units on two variables that have more than two categories.

Is eating breakfast associated with your commute method? \(5 \times 2\) table

\(Y_i\) = {Ate breakfast, Didn’t eat breakfast },
\(G_i\) = {Walk, Bike, Drove alone, Drove with others, Other }
Is your favorite sport associated with your favorite ice cream flavor? \(3 \times 5\) table

\(Y_i\) = {Baseball, Basketball, Football, Soccer, Hockey },
\(G_i\) = {Chocolate, Strawberry, Vanilla }

Chi-square test for \((r \times c)\) tables

Same as in \(2 \times 2\) case, we can do a Chi-square test.

\(H_0:\) No association between Variable 1 and Variable 2

\(O_{ij}\): observed count in row \(i\), column \(j\)

\(E_{ij}\): expected count in row \(i\), column \(j\) \[ E_{ij} = \frac{R_iC_j}{N} \]

\[ X = \sum_{i = 1}^{r} \sum_{j = 1}^{c} \frac{\left(O_{ij} - E_{ij}\right)^2}{E_{ij}} \]

Under null hypothesis \(X \dot \sim \chi^2_{(r - 1)\times(c-1)}\). Reject for large X.

Example

“Table 2.5, from the 2000 General Social Survey, cross classifies gender and political party identification. Subjects indicated whether they identified more strongly with the Democratic or Republican party or as Independents.”

(Agresti 2007)

	Democrat	Independent	Republican	Sum
F	762	327	468	1557
M	484	239	477	1200
Sum	1246	566	945	2757

Example: Expected counts

	Democrat	Independent	Republican	Sum
F	703.7	319.6	533.7	1557
M	542.3	246.4	411.3	1200
Sum	1246	566	945	2757

Example: Cell contributions

	Democrat	Independent	Republican
F	4.835	0.1692	8.084
M	6.273	0.2196	10.49

## X-squared 
##  30.07015

Chi-squared test comments

Reference distribution is asymptotically exact.

Like \(2\times 2\) case, general rule of thumb: \(E_{ij} >5\) for all \(i,j\).

More than two tables

Your Turn

Is party preference associated with level of education?

Find the sample odds ratio for these two states?

State 1
education	democrat	rebublican
college	3	27
no college	7	63

State 2
education	democrat	rebublican
college	63	7
no college	27	3

Your turn

Now combine two tables and find the odds ratio.

Combined
education	democrat	rebublican
college	66	34
no college	34	66

Simpson’s paradox

“…in which a trend appears in different groups of data but disappears or reverses when these groups are combined.”

https://en.wikipedia.org/wiki/Simpson%27s_paradox

The Mantel-Haenszel procedure attempts to avoid the paradox by combining the individual odds ratios (rather than collapsing the tables and computing a single odds ratio)

Mantel-Haenszel odds ratio

\(k\) tables, indexed by \(j = 1, \ldots, k\).

Individual table odds ratio estimates: \[ \hat{\omega}_j = \frac{a_j d_j}{b_j c_j} \]

Combine in a weighted average: \[ \hat{\omega}_{MH} = \sum_{j= 1}^k \text{weight}^*_j \times \hat{\omega}_j \] where \[ \text{weight}^*_j = \frac{\text{weight}_j}{\sum{\text{weight}_j}} \quad \text{ and } \text{weight}_j = \frac{b_jc_j}{N_j} \]

Your Turn

Find \(\hat{\omega}_{MH}\) for the two tables:

State 1
education	democrat	rebublican
college	3	27
no college	7	63

State 2
education	democrat	rebublican
college	63	7
no college	27	3

Mantel-Haenszel test

\(H_0: \omega_j = 1\) for all \(j = 1, \ldots, k\)

\[ X = \frac{\left(\sum_{j = 1}^{k}( a_j - E(a_j)) \right)^2}{\sum_{j = 1}^{k} Var(a_j)} \]

\[ E(a_j) = \frac{(R_{1j})(C_{1j})}{N_j} \]

\[ Var(a_j) = \frac{R_{1j}C_{1j}R_{2j}C_{2j}}{N_j^2(N_j - 1)} \]

Under the null hypothesis \(X \dot \sim \chi^2_1\). Reject \(H_0\) for large values of \(X\).

In R

mantelhaen.test(df$education, df$party, z = df$state)

## 
##  Mantel-Haenszel chi-squared test without
##  continuity correction
## 
## data:  df$education and df$party and df$state
## Mantel-Haenszel X-squared = 0, df = 1,
## p-value = 1
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3649129 2.7403803
## sample estimates:
## common odds ratio 
##                 1

Mantel-Haenszel Cautions

The test assumes the odds ratio is the same in all \(k\) tables.

If this assumption is not met, it’s difficult to interpret the p-value, and it doesn’t make sense to estimate a common odds ratio.
The test may fail to reject the null if the odds ratios are different from 1 but in opposite directions.

More complicated tables ST551 Lecture 24

Homeworks

Your Turn: Find all the problems in this statistical summary

Two extensions to the \(2 \times 2\) contingency tables

More than two categories