Homework 6

Due 2017/11/16

Submit your homework as a compiled Rmarkdown document. Submit both .pdf (either generated directly from the Rmarkdown, or saved from the Word version generated for the Rmarkdown), and the Rmarkdown file itself (the .Rmd). If you do your simulations in an separate R file, please also submit that, but the summary of the simulations must be in the .pdf, or you will receive no credit for the problem.

Submit your answers on canvas.

1. Properties of two sample t-tests

  1. Show algebraically, that Welch’s t-test and two sample equal variance t-test, have same test statistic when the sample sizes are the same (i.e. \(n = m\)).

  2. Recall that in the case where \(n=m\), Welch’s test estimates the variance of \(\overline{Y} - \overline{X}\) with: \[ \frac{s_Y^2}{n} + \frac{s_X^2}{n} \] where as the paired t-test estimates the variance of \(\overline{Y} - \overline{X}\) with \[ \frac{s_Y^2}{n} + \frac{s_X^2}{n} - 2\frac{s_{YX}}{n} \]

    1. If the data are truly paired, i.e \(\sigma_{XY} \ne 0\), on average, what will the two tests estimate for the variance of \(\overline{Y} - \overline{X}\)?

    2. Use your answer to (i) to argue that when the data are truly paired, using the unpaired test (Welch’s) may be misleading.

    3. If the data are not paired, i.e \(\sigma_{XY} = 0\), on average, what will the two tests estimate for the variance of \(\overline{Y} - \overline{X}\)?

    4. Use your answer to (iii) to argue that if the data are not paired, using the paired test is not misleading.

    5. In the the case where the data are unpaired, although the paired test is not misleading, it has less power than the two sample t-test. Design and conduct a simulation to demonstrate this fact.

2. Data Analysis

For each for the following, perform an appropriate t-test to answer the question of interest. In each case your answer should:

  • Provide a plot that summarizes the data
  • Justify the choice of t-test
  • Include a statistical summary
  1. Using the BRFSS data from previous homeworks: Is the mean desired weight loss the same for male and female US residents?

  2. Using the ACS data described below: Do Washington households have the same mean total household income as Oregon households?

  3. Using the ACS data described below: How much older are husbands than their wives in Oregon opposite sex married couples?

ACS data

The American Community Survey (ACS), is a large survey undertaken by the US Census Bureau in the years between decennial censuses. For this homework, you are given a subset of Public Use Micro Data sample for Oregon and Washington from 2016 that corresponds to households that contain opposite gender married couples (you may assume this is a simple random sample of such households in Oregon, it’s not quite, but we’ll pretend it is).

To get the data, with one row per household:

library(tidyverse)
download.file("http://st551.cwick.co.nz/data/couples_wide.csv", 
  "couples_wide.csv")
couples_wide <- read_csv("couples_wide.csv")
couples_wide

You may find some tasks easier with a reshaped dataset, that has one row for each person:

download.file("http://st551.cwick.co.nz/data/couples_long.csv", 
  "couples_long.csv")
couples_long <- read_csv("couples_long.csv")
couples_long

The variables I am providing are a subset of those available. You can find a summary of them below.

Variables in couples_wide

column name Variable
household_id A unique ID number for each household
husband_age Age in years of the husband
husband_income Total annual income of the husband, can include wages, retirement, interest, social security, self employment income.
wife_age Age in years of the wife
wife_income Total annual income of the wife
state State the household is in