Submit your homework as a compiled Rmarkdown document. Submit both .pdf (either generated directly from the Rmarkdown, or saved from the Word version generated for the Rmarkdown), and the Rmarkdown file itself (the .Rmd). If you do your simulations in an separate R file, please also submit that, but the summary of the simulations must be in the .pdf, or you will receive no credit for the problem.
Submit your answers on canvas.
1. Properties of two sample t-tests
Show algebraically, that Welch’s t-test and two sample equal variance t-test, have same test statistic when the sample sizes are the same (i.e. \(n = m\)).
Recall that in the case where \(n=m\), Welch’s test estimates the variance of \(\overline{Y} - \overline{X}\) with: \[ \frac{s_Y^2}{n} + \frac{s_X^2}{n} \] where as the paired t-test estimates the variance of \(\overline{Y} - \overline{X}\) with \[ \frac{s_Y^2}{n} + \frac{s_X^2}{n} - 2\frac{s_{YX}}{n} \]
If the data are truly paired, i.e \(\sigma_{XY} \ne 0\), on average, what will the two tests estimate for the variance of \(\overline{Y} - \overline{X}\)?
Use your answer to (i) to argue that when the data are truly paired, using the unpaired test (Welch’s) may be misleading.
If the data are not paired, i.e \(\sigma_{XY} = 0\), on average, what will the two tests estimate for the variance of \(\overline{Y} - \overline{X}\)?
Use your answer to (iii) to argue that if the data are not paired, using the paired test is not misleading.
In the the case where the data are unpaired, although the paired test is not misleading, it has less power than the two sample t-test. Design and conduct a simulation to demonstrate this fact.
2. Data Analysis
For each for the following, perform an appropriate t-test to answer the question of interest. In each case your answer should:
- Provide a plot that summarizes the data
- Justify the choice of t-test
- Include a statistical summary
Using the BRFSS data from previous homeworks: Is the mean desired weight loss the same for male and female US residents?
Using the ACS data described below: Do Washington households have the same mean total household income as Oregon households?
Using the ACS data described below: How much older are husbands than their wives in Oregon opposite sex married couples?
ACS data
The American Community Survey (ACS), is a large survey undertaken by the US Census Bureau in the years between decennial censuses. For this homework, you are given a subset of Public Use Micro Data sample for Oregon and Washington from 2016 that corresponds to households that contain opposite gender married couples (you may assume this is a simple random sample of such households in Oregon, it’s not quite, but we’ll pretend it is).
To get the data, with one row per household:
library(tidyverse)
download.file("http://st551.cwick.co.nz/data/couples_wide.csv",
"couples_wide.csv")
couples_wide <- read_csv("couples_wide.csv")
couples_wide
You may find some tasks easier with a reshaped dataset, that has one row for each person:
download.file("http://st551.cwick.co.nz/data/couples_long.csv",
"couples_long.csv")
couples_long <- read_csv("couples_long.csv")
couples_long
The variables I am providing are a subset of those available. You can find a summary of them below.
Variables in couples_wide
column name | Variable |
---|---|
household_id | A unique ID number for each household |
husband_age | Age in years of the husband |
husband_income | Total annual income of the husband, can include wages, retirement, interest, social security, self employment income. |
wife_age | Age in years of the wife |
wife_income | Total annual income of the wife |
state | State the household is in |