Homework 3

Due 2017/10/19

Submit your answers on canvas.

1. p-values

Download the ASA’s Statement on p-values, (see Wasserstein and Lazar 2016 reference below).

Skip the “Context, Process and Purpose” section and read the section “ASA Statement on Statistical Significance and P-values” starting on page 3.

Answer the following questions:

  1. We usually think about a small p-value providing evidence against the null hypothesis. What else does the article imply a small p-value may cast doubt on?

  2. What is the primary argument for not basing scientific conclusions or policy decision solely on whether the p-value is below some threshold?

  3. What is p-hacking?

  4. Can a p-value measure the size of an effect? What can measure the size of an effect?

  5. Skim through the references in the “A brief p-Values and Statistical Significance Reference List”, and shortlist three article titles that interest you. (You may be required to read one of these in a future homework)

2. Data analysis

The Behavioral Risk Factor Surveillance System (BRFSS) is a nationwide health-related survey of U.S. residents. For this question you can get a sample of responses from the 2003 survey by downloading an R data file from the class website:

library(tidyverse)
download.file("http://st551.cwick.co.nz/data/brfss.rds", 
  "brfss.rds", mode = "wb")

Then load it into the variable brfss with:

brfss <- read_rds("brfss.rds")
brfss

The variables weight_kg and wtdesire_kg correspond to the responses to the questions:

  • About how much do you weigh without shoes?
  • How much would you like to weigh?

respectively converted to kilograms.

You can create a variable to represent the amount of weight a respondent would like to lose with:

brfss <- mutate(brfss, desired_loss = weight_kg - wtdesire_kg)
  1. Find summary statistics (mean, standard deviation and number of observations) for desired_loss for both males and females in the sample.

  2. Produce histograms of desired_loss for both males and females.

  3. Do US resident females, on average, want to lose weight (i.e. is the mean desired loss greater than zero)? Conduct the appropriate analyses and write a statistical summary of your findings

  4. Do US resident males, on average, want to lose weight (i.e. is the mean desired loss greater than zero)? Conduct the appropriate analyses and write a statistical summary of your findings

3. Performance of t-test

Explore the Type I error rate of the t-test for a two-sided level \(\alpha = 0.05\) test, for samples of size \(n = 5, 10, 25, 50\), for one of the following population distributions:

  • Uniform(0, 1)
  • Chi-squared(1)
  • Beta(.5, .5)
  • Exponential(1)

Use at least 10,000 simulations for each scenario.

  1. Provide a table of the estimated Type I error rate by sample size.

  2. Write a short (3-5 sentence) summary of how the t-test performs: is it close enough to exact that you would be comfortable using it even when the underlying distribution is as far from normal as these distributions?

References

Wasserstein, Ronald L, and Nicole A Lazar. 2016. “The ASA’s Statement on P-Values: Context, Process, and Purpose” 70 (2): 129–33. doi:10.1080/00031305.2016.1154108.