`library(tidyverse)`

## Using simulations to evaluate tests

Since the performance of a test depends on the sampling distribution of the test statistic we can evaluate tests via simulation the same way we’ve been simulating sampling distributions.

To do so, we make some assumptions about the population, generate the sampling distribution of the test statistic and then estimate its Type I error rate or power.

To illustrate the technique we’ll explore the power and Type I error rate of the Z-test, when the population is Poisson.

### Z-test Review

Let’s start with a single sample, to review the steps in a Z-test. For now, let’s assume the population is Poisson\((\lambda = 2)\), we are taking a sample of size n = 10 and will conduct a level \(\alpha = 0.05\) test:

```
lambda <- 2
n <- 10
alpha <- 0.05
```

We can generate a single sample with:

```
# A single sample from Poisson dist.
(y <- rpois(n = n, lambda = lambda))
```

`## [1] 1 3 3 1 2 1 2 2 1 1`

Recall the Z-statistic for the null hypothesis \(H_0: \mu = \mu_0\) is, \[ Z(\mu_0) = \frac{\overline{Y} - \mu_0}{\sigma/\sqrt{n}} \]

If we are interested in testing the null hypothesis \(\mu = 2\), we can calculate the Z-statistic:

```
mu_0 <- 2
# Find the z-statistic
(z <- (mean(y) - mu_0)/(sqrt(lambda/n)))
```

`## [1] -0.6708204`

**Why is \(\sigma^2 = \lambda\)?**

Our rejection region depends on the direction of our alternative, let’s assume we are interested in the two sided alternative, in which case we reject if \(|Z(\mu_0)| > z_{1-\alpha/2}\), and we do the comparison:

```
# Reject the null?
abs(z) > qnorm(1 - alpha/2)
```

`## [1] FALSE`

Where `TRUE`

would indicate we should reject the null, and `FALSE`

fail to reject the null. In this case we “Fail to reject the Null”.

To make things easy in the following sections, let’s wrap the Z-statistic calculation into a function:

```
# small function for calculating the z-statistic
# x: vector of sample values
# mu0: null hypothesized mean
# sig2: known population variance
z_stat <- function(x, mu0, sig2){
n <- length(x)
(mean(x) - mu0) / sqrt(sig2/n)
}
```

So, we can in future do it all in one step

`abs(z_stat(y, mu0 = 2, sig2 = lambda)) > qnorm(1 - alpha/2)`

You can learn more about writing functions in the Writing Your Own Functions section of Hands on Programming in R.

**You might want to take the time now to review how to find a p-value, and a 95% confidence interval for this particular sample**

### Evaluating the test

OK that was for one sample, to evaluate the performance of the test, we need to repeat that for many samples, then take a look at how often we are rejecting the null hypothesis.

Let’s start with evaluating the Type I error rate. Remember the Type I error rate is the probaility of rejecting the null when the null is true, so we need to set up our test to be testing the true population mean \(\lambda\).

We can generate a large number of samples, just as we have before:

```
n_sim <- 2000
samples_poisson <- rerun(.n = n_sim,
rpois(n, lambda = lambda))
```

But, now rather than find the sample mean of each sample we find the Z-statistic, makin sure \(\mu_0 = \lambda\):

```
sample_z <- map_dbl(samples_poisson,
~ z_stat(.x, mu0 = lambda, sig2 = lambda))
```

`sample_z`

now contains 2000 simulated test statistics, we can take a look:

```
ggplot() +
geom_histogram(aes(x = sample_z))
```

Or directly estimate the rejection rate with by finding the proportion that were in the rejection region

`mean(abs(sample_z) > qnorm(1 - alpha/2))`

`## [1] 0.059`

We estimate that \(P_{H_0}(\text{Reject Null}) = 0.06\).

**What should this number be close to?**

If we were interested instead in *power*, we need to pick specific value for the true population mean that falls in our alternative hypothesis region. Let’s say \(\mu_A = 3\). We need to regenerate our samples with this true mean:

```
mu_A <- 3
samples_poisson_A <- rerun(.n = n_sim,
rpois(n, lambda = mu_A))
```

But continue to do our tests on the hypothesis that \(\mu = \lambda = 2\)

```
sample_z_A <- map_dbl(samples_poisson_A,
~ z_stat(.x, mu0 = lambda, sig2 = lambda))
```

Then we can estimate the rejection probability:

`mean(abs(sample_z_A) > qnorm(1 - alpha/2))`

We estimate the power for the alternative \(\mu = \mu_A = 3\), is 0.59.

**Would you expect the power to be higher or lower for the alternative \(\mu = \mu_A = 5\)? Why? Simulate it and see.**

**Would you expect the power to be higher or lower for a larger sample size, n = 100. Why? Simulate it and see.**

**Would you expect the Type I error rate to be closer or further from \(\alpha = 0.05\) if n = 100. Why?**

**In HW#2 you have to calculate the power of the Z-test for the null hypothesis \(H_0: \mu = 2.6\), in a situation where the population mean is really \(\mu = \mu_A = 2.7\), the population variance is \(\sigma^2 = 1.96\), and n = 25. You could verify your calculations are in the right ballpark, by assuming a Normal population, and simulating to evaluate the power. Try it.**