Homework 1 - ST551 Fall 2017

Submit your answers on canvas.

(2 pts) Give one-sentence answers for the following:
1. What is the difference between a population parameter and a statistic?
2. Describe what we mean by the ‘sampling distribution’ for a statistic.
I’ve made some updates to class_data so make sure you re-download it to get the latest version before starting on this section.

Recall from lecture on Fri Sep 29, we simulated the sampling distribution for the sample mean for samples of size 5 from our population of commute times from the first day of class:
```
library(tidyverse)
# Read in class data
class_data <- read_csv("class_data.csv")

# Simulation settings
n <- 5
n_sim <- 1000

# Generate many samples
samples <- rerun(.n = n_sim, 
  sample(class_data$commute_time, size = n))

# Find sample mean of each sample
sample_means <- map_dbl(samples, ~ mean(.x))
```
The following questions ask you to make adjustments to this code to investigate some other sampling distributions. Ensure you include your R code for each part, any plots you use to obtain your answers, as well as your answers to the questions.
1. Simulate the sampling distribution for the sample mean for samples of size n = 10 from the commute times.
  1. (1 pt) Describe how the distribution differs to that based on samples of size 5.
  2. (2 pts) Justify your observations based on the properties we derived in Lecture on Fri Sep 29.
2. Simulate the sampling distribution for the sample mean for samples of size n = 37 from the commute times.
  1. (1 pt) Examine the distribution. Are you surprised?
  2. (1 pt) Describe why the distribution looks the way it does.
3. Simulate the sampling distribution for the sample median for samples of size 5 from the commute times.
  1. (1 pt) Describe the distribution.
  2. (1 pt) Use your simulated sampling distribution to estimate the probability, that a sample of size 5 from this population results in a median less than 10.
  3. (1 pt) Do any of the properties of sampling distributions we derived in the lecture from Mon 25 to Fri 29 Sep justify the properties of this sampling distribution?
4. Simulate the sampling distribution for the sample mean for samples of size 5 from the prefers_cats column in class_data.
  1. (1 pt) Describe the distribution. (A histogram might not be the most appropriate plot here.)
  2. (1 pt) Do any of the properties of sampling distributions we derived in the lecture from Mon 25 to Fri 29 Sep justify the properties of this sampling distribution?
5. Instead of sampling from class_data, you could sample from a population with a standard Normal distribution, by replacing sample(class_data$commute_time, size = n), with rnorm(n).
  1. (2 pts) Derive the sampling distribution of the sample mean for samples of size n = 2 and n = 10 from a standard Normal population.
  2. (2 pts) Simulate the sampling distribution of the sample mean for samples of size n = 2 and n = 10 from a standard Normal population. Are your simulated distributions consistent with your derived distributions?