Homework 8

Due 2017/12/01

Submit your homework as a compiled Rmarkdown document. Submit both .pdf (either generated directly from the Rmarkdown, or saved from the Word version generated for the Rmarkdown), and the Rmarkdown file itself (the .Rmd). If you do your simulations in an separate R file, please also submit that, but the summary of the simulations must be in the .pdf, or you will receive no credit for the problem.

Submit your answers on canvas.

1. Simulation

Investigate a property of one of the methods we’ve seen this quarter, of your choice (that you haven’t already investigated in a homework).

You should clearly state what you are investigating, and how you structured your simulations. You should also include a brief summary of your conclusions, inclusing relevant tables or figures to support your summary.

Some ideas:

  • The performance of the Rank Sign test with asymmetric distributions
  • Compare the performance of the Chi-square test for variance with the t-test for variance when the population isn’t Normal
  • The exactness (or power) of the K-S test for small samples
  • The performance of F-test for variances (maybe compare with Levene’s test) with non-Normal populations
  • The performance of the Wilcoxon Rank Sum test for different hypotheses under the location shift (or not) assumption
  • The performance of the two-step procedure that uses Levene’s test to decide between the equal variance and Welch’s two sample t-tests for a test of equal means.

2. Data Analysis

For each for the following, perform an appropriate procedure to answer the question of interest. In each case your answer should:

  • Provide a plot that summarizes the data
  • Justify the choice of procedure
  • Include a statistical summary
  1. Using the acs_couples data (see below): Is there an association between health insurance and gender? Include estimates and confidence intervals for the probability of coverage for each gender in your summary.

  2. Using the acs_respondents data (see below): Is the median income of Oregon residents the same as the median income of Washington residents? (You don’t need to include a confidence interval for the difference in medians, but you should include estimates and confidence intervals for the individual medians).

  3. Using the acs_respondents data (see below): Do incomes of Oregon residents tend to be about the same as the incomes of Washington residents? (Hint: “tend to be about the same” might be a way to express \(P(Y > X) = 0.5\)). No need for a point estimate or confidence interval.

  4. Using the acs_respondents data (see below): Are the modes of transport to work used in the same proportions in Oregon and Washington? No need for a point estimate or confidence interval.

ACS data

The American Community Survey (ACS), is a large survey undertaken by the US Census Bureau in the years between decennial censuses.

For this homework, you are given a two different subsets of the Public Use Micro Data sample for Oregon and Washington from 2016:

  • acs_couples corresponds to both the husband and wife inn Oregon households that contain opposite gender married couples

  • acs_respondents corresponds to individuals that answered the survey (all from different households). (You may assume these are like a random sample of adult residents in Oregon and Washington).

library(tidyverse)
download.file("http://st551.cwick.co.nz/data/acs_couples.csv", 
  "acs_couples.csv")
acs_couples <- read_csv("acs_couples.csv")
acs_couples

download.file("http://st551.cwick.co.nz/data/acs_respondents.csv", 
  "acs_respondents.csv")
acs_respondents <- read_csv("acs_respondents.csv")
acs_respondents

Variables in acs_couples

column name Variable
household_id A unique ID number for each household
state State the household is in
husband_age Age in years of the husband
husband_income Total annual income of the husband, can include wages, retirement, interest, social security, self employment income.
husband_health_insurance Either with or without health insurance coverage
wife_age Age in years of the wife
wife_income Total annual income of the wife
wife_health_insurance Either with or without health insurance coverage

Variables in acs_respondents

column name Variable
household_id A unique ID number for each household
state State the household is in
age Age of respondent
sex Sex of respondent
total_income Total income of respondent
health_insurance Is respondent with or without health insurance coverage?
transport Means of transportation to work (missing if not a worker)