## Announcements

Lectures this week:

- Today lecture: Delta method and Bootstrap
- Weds lecture: Randomization & Permutation
- Friday lecture:
**Cancelled**- Office hours instead

**Formula Sheet** The final is closed book, no note sheet. I am willing to provide some of the harder (less common) formulae.

**Lab**: No set material, I’ll encourage Chuan to lead a formula strategy session.

# Delta Method

## Delta Method

If the sampling distribution of a statistic converges to a Normal distribution, the **Delta method**, provides a way to approximate the sampling distribution of a function of a statistic.

*Univariate* Delta Method

If \[ \sqrt{n}\left(\hat{\theta} - \theta \right) \rightarrow_D N(0, \sigma^2) \]

then \[ \sqrt{n}\left(g(\hat{\theta}) - g(\theta) \right) \rightarrow_D N(0, \sigma^2[g'( \theta]^2 ) \]

(As long as \(g'(\theta)\) exists and is non-zero valued.)

## Another way of saying it

If we know, \[ \hat{\theta} \, \dot \sim \, N(\theta, \sigma^2) \]

then,

\[ g(\hat{\theta}) \, \dot \sim \, N(g(\theta), \sigma^2[g'(\theta)]^2) \]

The approximation can be pretty rough. I.e. just because the sample is large enough that the original statistic is reasonably Normal, doesn’t meant the transformed statistic will be.

## Example: Log Odds

Let \(Y_1, \ldots, Y_n \sim\) Bernoulli\((p)\), and \(X = \sum_{i=1}^n Y_i\).

We know \(\hat{p} = \frac{X}{n} \sim N(p, \frac{p(1 - p)}{n})\).

We might estimate the log odds with: \[ \log\left(\frac{\hat{p}}{1-\hat{p}}\right) \]

**What is the assymptotic distribution of the estimated log odds?**

## Example: Log Odds cont.

\[ g(p) = \log\left(\frac{p}{1-p}\right) = \log(p) - \log{(1-p)} \]

## Other comments on delta method

Derived using a Taylor expansion of \(g(\hat{\theta})\) around \(g(\theta)\)

There is also a multivariate version (useful if you need some function of two statistics, e.g. ratio of sample means)

# Bootstrap

## Bootstrap

A method to approximate the sampling distribution of a statistic

**Idea:**

- Recall, one way to approximate the sampling distribution of a statistic was by
**simulation**, but you have to assume a population distribution. - The bootstrap uses the
*empirical distribution function*as an estimate for the population distribution, i.e relies on \[ \hat{F}(y) \approx F(y) \]

## Example - Sampling distribution of Median by simulation

Assume a population distribution, i.e. \(Y \sim N(\mu, \sigma^2)\)

**Repeat for \(k = 1, \ldots, B\)**

- Sample \(n\) observations from \(N(\mu, \sigma^2)\)
- Find sample median, \(m^{(k)}\)

Then the simulated sample medians, \(m^{(k)}, k = 1, \ldots, B\) approximate the sampling distribution of the sample median.

## Example - Sampling distribution of Median by bootstrap

Estimate the population distribution from the sample, i.e. \(\hat{F}(y)\)

**Repeat for \(k = 1, \ldots, B\)**

- Sample \(n\) observations from a population with c.d.f \(\hat{F}(y)\)
- Find sample median, \(m^{(k)}\)

Then the bootstrapped sample medians, \(m^{(k)}, k = 1, \ldots, B\) approximate the sampling distribution of the sample median.

## Sampling from a c.d.f

You can sample from any c.d.f by sampling from a Uniform(0, 1), then transforming with the inverse c.d.f.

I.e. sample \(u_1, \ldots, u_n\) i.i.d from Uniform(0,1), then

\[ y_i = F^{-1}(u_i) \quad i = 1, \ldots, n \] are distributed with c.d.f \(F(y)\).

## In the empirical case

Sampling from the ECDF is equivalent to sampling with replacement from the original sample.

## Example - Sampling distribution of Median by bootstrap

**Repeat for \(k = 1, \ldots, B\)**

- Sample \(n\) observations with replacement from \(Y_1, \ldots, Y_n\)
- Find sample median, \(m^{(k)}\)

Then the bootstrapped sample medians, \(m^{(k)}, k = 1, \ldots, B\) approximate the sampling distribution of the sample median.

A little more subtly: \[ \hat{m} - m \, \dot \sim \, \tilde{m} - \hat{m} \]

## Example

Sample values: *1.8*, *2.2*, *2.7*, *5.7*, *6.9*, *7.4*, *8.1*, *8.7*, *9* and *9.5*

Sample median: 7.1562828

A bootstrap resample: *1.8*, *2.7*, *2.7*, *5.7*, *6.9*, *7.4*, *8.1*, *8.1*, *8.7* and *9.5*

Sample median: 7.1562828

## Many resamples

## Bootstrap confidence intervals

Many methods..

A common one:

**Quantile:**\(100(\alpha/2)\) largest resampled statistic value, and \(100(1 - \alpha/2)\) largest resampled statistic value

## Comments on the bootstrap

Relies on \(\hat{F}(y)\) being a good estimate of the \(F(y)\), doesn’t necessarily solve small sample problems.

Resampling should generally mimic original study design. E.g. If pairs of observations are sampled from a population, pairs should be resampled