---
title: Lab 4
author: Charlotte Wickham
date: '2017-10-17'
slug: lab-4
draft: false
output:
  blogdown::html_page:
    toc: true
---

<!-- This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, results = "hide",
  message = FALSE, fig.keep = "none")
set.seed(1918)
```


```{r}
library(tidyverse)
library(broom)
library(openintro) 
```

## t-tests in R

The [`t.test()`](https://www.rdocumentation.org/packages/stats/topics/t.test) function in R will conduct a t-test and produce a corresponding confidence interval.  It's a bit of a multi purpose function as it handles one-sample t-tests (which we saw last week), paired t-tests (a special case of a one sample t-test) and two sample t-tests (which we will see soon).

To do a one sample t-test you simply pass in the sample values as the first argument `x`.  The arguments `alternative` (default is `"two.sided"`), `mu` (default is `0`) and `conf.level` (default is `0.95`) control the direction of the alternative hypothesis, the null hypothesized value and the confidence level for the CI respectively, take a look at `?t.test` for more info.

As an example, let's look at a data set, `friday` from the package `openintro`, that compares the traffic accidents on Friday the 13th, to the Friday the 6th of the same month:  

```{r, eval = FALSE}
?friday
```

A hypothesis might be that accidents on the two Fridays should, on average, be the same.  That is, the differences between the number accidents on the 13th and those on the 6th, has a population mean of zero.  

Let's grab just those rows that correspond to the traffic accidents and pull out the column of differences (more on `filter()` and `pull()` next lab):
```{r}
diffs <- friday %>%
  filter(type == "accident") %>%
  pull(diff) 
```

Then conduct a t-test with the null hypothesis, the the mean difference is zero, $H_0: \mu = 0$, versus the two-sided alternative:
```{r}
t.test(diffs)
```

* **Try writing a statistical summary based on these results**

(In Homework 3 you are asked to do a t-test, you may either do the calculations yourself from the summary statistics, or use `t.test()`.  I'd recommend doing it both ways so you get practice doing it from summary stats and you can check your work with `t.test()`).

The resulting `t.test()` object is list, if you store it: 
```{r}
test_friday <- t.test(diffs)
str(test_friday)
```

You can then extract specific components by name:
```{r}
test_friday$statistic
test_friday$p.value
```

* **Pull out the confidence interval. Check (with code) $\mu = 0$ is outside the interval.  Would $\mu = 0$ be outside a 99% confidence interval?** 

## The behaviour of p-values 

We've talked about the performance of tests in terms of their rejection rates:

* If the null hypothesis is true, we want the rejection rate to be close to the stated significance level $\alpha$.

* If the alternative is true, we want the rejection rate to be high (i.e. high power) and increase with sample size.

You've investigated these by simulating samples, calculating the test statistic and comparing the simulated test statistics to a critical value for a specific level, $\alpha$.  Alternatively, you could simulate samples, get the p-value for the test, and then compare the simulated p-values to any significance level $\alpha$.

Let's explore with the t-test.  We'll simulate our samples for a case where we know the t-test is exact, a Normal(0, 1) population and a sample size of 30:

```{r}
n_sim <- 2000
samples <- rerun(n_sim,
  rnorm(30))
```

We can conduct a t-test on each sample and extract the p-value:
```{r}
p_values <- map_dbl(samples, ~ t.test(.x, mu = 0)$p.value)
```

Then examine the distribution of the p-values:
```{r}
ggplot() + 
  geom_histogram(aes(x = p_values), binwidth = 0.05, center = 0.05/2) +
  xlim(c(0, 1))
```


* **What proportion of the simulated samples would you expect have a p-value less than 0.05? Why?**

* **What proportion of the simulated samples would you expect have a p-value less than 0.10? Why?**

* **What proportion of the simulated samples would you expect have a p-value less than $\alpha$? Why?**

For an exact test, under the null hypothesis the distribution of p-values should be Uniform(0, 1). 

* **Try simulating p-values when the samples come from a Exponential(1), and small sample size, $n = 5$.  Does the test look exact?**

## Power and sample size

Recall the the power of the Z-test depends on the sample size, $n$, the difference between the hypothesized mean and the true mean (under the alternative), $\mu_0 - \mu_A$, the population variance $\sigma^2$, and the significance level, $\alpha$.  

The power for the t-test depends on the same quantities but in a slightly more complicated way, luckily there's an R function that will calculate it for you: [`power.t.test()`](https://www.rdocumentation.org/packages/stats/topics/power.t.test).

For example, to find the power for a specific alternative, $\mu = \mu_A$, for a two-sided t-test of $H_0: \mu = \mu_0$, we could use:
```{r}
power.t.test(n = 25, delta = 0.1, sd = sqrt(1.96), sig.level = 0.1, 
  type = "one.sample",
  alternative = "two.sided")
```
Where `n` is the sample size, `delta` =  $\mu_0 - \mu_A$, `sd` = $\sqrt{\sigma^2}$ and `sig.level` = $\alpha$.  (The values here were chosen to mimic those in HW#2 3(c)). 

You can pass in a vector of sample size to see how power depends on sample size:
```{r}
n <- (2:70)^2
powers <- power.t.test(n = n, delta = 0.1, sd = sqrt(1.96), sig.level = 0.1, 
  type = "one.sample",
  alternative = "two.sided") 

ggplot(tidy(powers)) +
  geom_line(aes(x = n, y = power))
```
(The `tidy()` function comes from the `broom` package, and will convert a lot of statistical objects into a 'tidy' data frame.)

It increases to an asymptote of 1.

* **For $n = 30$ explore how the power depends on `delta`**.
* **For $n = 30$ explore how the power depends on the population standard deviation**.
* **For $n = 30$ explore how the power depends on the significance level**.

In `power.t.test()` if you specify the `power` argument as a desired power, and leave one of the other arguments unspecified, it will return the value required to achieve the desired power.  For example, if we want to detect a difference of 0.1 with power 0.8, with the same population standard deviation and level of test:
```{r}
power.t.test(power = 0.8, delta = 0.1, sd = sqrt(1.96), sig.level = 0.1, 
  type = "one.sample",
  alternative = "two.sided")
```

We'd need a sample size of about $n = 1214$.  Specifying the desired power for a specific minimum difference of interest, is the usual procedure for determining the sample size required for a study.

* **How big a sample is needed to detect a difference of 1 unit with power 0.8, if the population standard deviation is 15, and we are conducting a level $\alpha = 0.05$ test?**