Submit your homework as a compiled Rmarkdown document. Submit both .pdf (either generated directly from the Rmarkdown, or saved from the Word version generated for the Rmarkdown), and the Rmarkdown file itself (the .Rmd). If you do your simulations in an separate R file, please also submit that, but the summary of the simulations must be in the .pdf, or you will receive no credit for the problem.
Submit your answers on canvas.
1. Simulation
Investigate a property of one of the methods we’ve seen this quarter, of your choice (that you haven’t already investigated in a homework).
You should clearly state what you are investigating, and how you structured your simulations. You should also include a brief summary of your conclusions, inclusing relevant tables or figures to support your summary.
Some ideas:
- The performance of the Rank Sign test with asymmetric distributions
- Compare the performance of the Chi-square test for variance with the t-test for variance when the population isn’t Normal
- The exactness (or power) of the K-S test for small samples
- The performance of F-test for variances (maybe compare with Levene’s test) with non-Normal populations
- The performance of the Wilcoxon Rank Sum test for different hypotheses under the location shift (or not) assumption
- The performance of the two-step procedure that uses Levene’s test to decide between the equal variance and Welch’s two sample t-tests for a test of equal means.
2. Data Analysis
For each for the following, perform an appropriate procedure to answer the question of interest. In each case your answer should:
- Provide a plot that summarizes the data
- Justify the choice of procedure
- Include a statistical summary
Using the
acs_couples
data (see below): Is there an association between health insurance and gender? Include estimates and confidence intervals for the probability of coverage for each gender in your summary.Using the
acs_respondents
data (see below): Is the median income of Oregon residents the same as the median income of Washington residents? (You don’t need to include a confidence interval for the difference in medians, but you should include estimates and confidence intervals for the individual medians).Using the
acs_respondents
data (see below): Do incomes of Oregon residents tend to be about the same as the incomes of Washington residents? (Hint: “tend to be about the same” might be a way to express \(P(Y > X) = 0.5\)). No need for a point estimate or confidence interval.Using the
acs_respondents
data (see below): Are the modes of transport to work used in the same proportions in Oregon and Washington? No need for a point estimate or confidence interval.
ACS data
The American Community Survey (ACS), is a large survey undertaken by the US Census Bureau in the years between decennial censuses.
For this homework, you are given a two different subsets of the Public Use Micro Data sample for Oregon and Washington from 2016:
acs_couples
corresponds to both the husband and wife inn Oregon households that contain opposite gender married couplesacs_respondents
corresponds to individuals that answered the survey (all from different households). (You may assume these are like a random sample of adult residents in Oregon and Washington).
library(tidyverse)
download.file("http://st551.cwick.co.nz/data/acs_couples.csv",
"acs_couples.csv")
acs_couples <- read_csv("acs_couples.csv")
acs_couples
download.file("http://st551.cwick.co.nz/data/acs_respondents.csv",
"acs_respondents.csv")
acs_respondents <- read_csv("acs_respondents.csv")
acs_respondents
Variables in acs_couples
column name | Variable |
---|---|
household_id | A unique ID number for each household |
state | State the household is in |
husband_age | Age in years of the husband |
husband_income | Total annual income of the husband, can include wages, retirement, interest, social security, self employment income. |
husband_health_insurance | Either with or without health insurance coverage |
wife_age | Age in years of the wife |
wife_income | Total annual income of the wife |
wife_health_insurance | Either with or without health insurance coverage |
Variables in acs_respondents
column name | Variable |
---|---|
household_id | A unique ID number for each household |
state | State the household is in |
age | Age of respondent |
sex | Sex of respondent |
total_income | Total income of respondent |
health_insurance | Is respondent with or without health insurance coverage? |
transport | Means of transportation to work (missing if not a worker) |