Homework 5

Due 2017/11/09

Remember starting with this homework, you are required to submit your homework as a compiled Rmarkdown document. Submit both .pdf (either generated directly from the Rmarkdown, or saved from the Word version generated for the Rmarkdown), and the Rmarkdown file itself (the .Rmd).

A basic template for this homework is provided at homework-5.Rmd.

Submit your answers on canvas.

1. Chi-square goodness of fit with estimated parameters

Cornhole is a popular lawn game in the US, where players throw a bean bag at a wooden platform with a hole in it. A bag in the hole scores 3 points, while one on the platform scores 1 point.

An avid cornhole analyst has observed n = 100 experienced players, and recorded the number of misses before they get a bag in the hole. These 100 observations can be read into R:

Y <- c(0L, 1L, 0L, 9L, 0L, 0L, 4L, 0L, 0L, 1L, 0L, 2L, 2L, 1L, 10L, 
3L, 0L, 13L, 0L, 0L, 0L, 43L, 3L, 0L, 6L, 1L, 11L, 0L, 0L, 0L, 
3L, 0L, 3L, 1L, 0L, 0L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 0L, 0L, 
0L, 0L, 0L, 16L, 9L, 2L, 15L, 1L, 4L, 3L, 0L, 18L, 1L, 3L, 0L, 
0L, 4L, 3L, 9L, 0L, 1L, 19L, 1L, 2L, 0L, 2L, 6L, 0L, 0L, 0L, 
2L, 3L, 0L, 8L, 41L, 2L, 1L, 2L, 22L, 0L, 6L, 17L, 17L, 0L, 0L, 
6L, 7L, 0L, 0L, 9L, 1L, 0L, 15L, 1L)
Y

(FYI the L just forces these numbers to be of integer type, it’s not essential for this problem).

The analyst is curious if these values are consistent with being drawn from a Geometric distribution.

  1. The geometric distribution has probability mass function: \[ P(Y = y) = (1-p)^yp, \] where \(p\) is an unknown parameter of the distribution. If \(Y_1, \ldots, Y_n\) is an i.i.d sample from a Geometric(p) distribution then, a good estimate of \(p\) is \[ \hat{p} = \frac{1}{\overline{Y}} \] Use the data to estimate \(p\).

  2. Tabulate the observed numbers of misses into the categories: 0, 1, 2, 3, 4, 5, 6+

  3. Find probabilities for each category above using the Geometric distribution with the estimated parameter (you can use dgeom()).

  4. Find the expected counts for each category using the probabilities found above.

  5. Check the condition for the Chi-square approximation to be appropriate.

  6. Calculate the value of the Chi-square statistic.

  7. Check your calculation of the test statistic by running chisq.test(x = O, p = probs), where O comes from part (b) and probs from part (c).

  8. What distribution should this statistic be compared to? Find the p-value for the test that the number of misses are consistent with a Geometric distribution. What would you conclude?

  9. Why does the chisq.test() run in (g) return the wrong p-value?

2. Data Analysis

Using the same brfss data as in HW #3.

I found the following as the start of an example in a textbook: “The heights of male adults between the ages 20 and 62 in the US are nearly normal with mean 70.0 inches and standard deviation 3.0 inches.”

  1. Create histogram of the heights (in inches) of the male respondents. Describe the distribution, is there anything unusual about it?

  2. Conduct a t-test of variance, with the null hypothesis \(H_0: \sigma^2 = 3^2\), where \(\sigma\) is the population variance of heights in inches for male respondents to the BRFSS survey.

    Write your conclusion in the form of a statistical summary including a point estimate and confidence interval.

  3. Test the hypothesis that the heights of the male respondents in the BRFSS survey come from a Normal(70, \(3^2\)) distribution.

    Write your conclusion in the form of a statistical summary (here there is no need for a point estimate or confidence interval).

3. One-sided K-S tests

In lecture we saw an example where two one-sided K-S tests gave conflicting results.

The setup of that example was:

True population: Y ~ N(0, 1) Null Hypothesis: Y ~ N(0, 100)

Replicate the example:

  1. Draw a sample of size 20 from the true population.

  2. Conduct tests of the null hypothesis with one-sided lesser, one-sided greater, and two-sided alternatives.

  3. Plot the ECDF of the sample, along with the CDF of the hypothesized distribution. Indicate on your plot where the test statistic for each test comes from.

  4. What properties of the true and hypothesized distributions leads to the contradiction?

  5. In your own words, describe why this suggests one-sided K-S tests are hard to interpret.