Sampling Distributions ST551 Lecture 4

Finish up Lecture 3 slides

Sampling distributions

Options for finding the sampling distribution:

  • Derive it mathematically
  • Can’t derive the distribution?
    • Derive properties of the distribution
    • Simulate
    • Approximate

Deriving the sampling distribution

Normal population: set up

Population distribution: \(Y \sim N(\mu, \sigma^2)\)

Sample: \(Y_1, \ldots, Y_n\) i.i.d from population

Sample statistic: Sample mean = \(\overline{Y} = \frac{1}{n}\sum_{i=1}^n Y_i\)

What is the sampling distribution of the sample mean?

Normal population: derivation

\(Y_1 + Y_2 \sim\)

\(Y_1 + Y_2 + Y_3 \sim\)

\(Y_1 + Y_2 + \ldots + Y_n \sim\)

\(\overline{Y} = \frac{Y_1 + Y_2 + \ldots + Y_n}{n} \sim\)

Bernoulli population

Population distribution: \(Y \sim \textit{Bernoulli}(p)\)

E.g US voters where \[ Y = \begin{cases} 1, & \text{Supports single payer health care} \\ 0, & \text{Does not support single payer health care} \end{cases} \]

Sample: \(Y_1, \ldots, Y_n\), i.i.d from population

Sample Statistic: Sample mean = \(\overline{Y} = \frac{1}{n}\sum_{i=1}^n Y_i =\) gives the sample proportion

What is the sampling distribution of the sample proportion? \(\sum_{i=1}^n Y_i \sim \text{Binomial} (n, p)\)

Bernoulli population

E.g. \(p = 0.56\), \(n = 10\)

Bernoulli population

E.g. \(p = 0.56\), \(n = 10\)

Can’t derive in these situations

  • Population: \(Y \sim Uniform(a, b)\)
  • Sample: size \(n\) i.i.d
  • Statistic: sample mean or sample variance
  • No closed form solution
  • Population \(Y \sim\) unknown
  • Sample: size \(n\) i.i.d
  • Statistic: anything
  • Can’t derive because we don’t know population distribution

What to do?

  1. Derive parameters of sampling distribution
  2. Simulate the sampling distribution
  3. Approximate the sampling distribution

Some more probability review

Cumulative Density Function

The cumulative density function of a random variable \(X\) is \[ F(x) = P(X \le x) \]

Probabilty Density/Mass Function

For continuous distributions we can define the probability density function:

\[ f(x) = \frac{d}{dx}F(x) \approx \frac{P\left(X \in (x - \Delta,\, x + \Delta) \right)}{2\Delta} \] For discrete distributions we have probability mass function:

\[ p(x) = P(X = x) \]

Probabilty Density/Mass Function

Expectation (Mean)

The expectation (or mean) of a random variable, \(X\), is

\[ \begin{aligned} E(X) &= \int\limits_{-\infty}^{\infty}xf(x) dx \quad &\text{for continuous distributions}\\ E(X) &= \sum\limits_{x: p(x) > 0} x p(x) \quad &\text{for discrete distributions} \end{aligned} \]

Expectation Properties

For any random variables \(X\) and \(Y\) (don’t need independence)

\(E(X + Y) = E(X) + E(Y)\)

\(E(a_1X_1 + \ldots + a_nX_n) = a_1E(X_1) + \ldots + a_nE(X_n)\)

Known as the linearity property.

Variance and Covariance

The variance of r.v. \(X\) is \[ Var(X) = E\bigl[ \left(X - E(X)\right)^2 \bigr] = E[X^2] - \left(E[X]\right)^2 \]

The covariance between r.v.’s \(X\) and \(Y\) is \[ Cov(X, Y) = E\left[\bigl(X- E(X) \bigr)\bigl(Y - E(Y) \bigr)\right] \] If \(X\) and \(Y\) are independent \(Cov(X, Y) = 0\) (converse isn’t true)

\(Cov(X, X) = Var(X)\)

Variance Properties

For any random variables \(X\) and \(Y\) (don’t need independence)

\(Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)\)

\(Var(X - Y) = Var(X) + Var(Y) - 2Cov(X, Y)\)

For random variables \(X_1, \ldots, X_n\) \[ \begin{aligned} \textit{Var}& \left(a_1 X_1 + \ldots + a_n X_n \right)= a_1^2 \textit{Var}(X_1) + \ldots + a_n^2 \textit{Var}(X_n) \, +\\ & a_1a_2 \textit{Cov}(X_1, X_2) + a_1a_3\textit{Cov}(X_1, X_3) + \ldots + a_1a_n\textit{Cov}(X_1, X_n) \, + \\ & \vdots \\ & a_na_1\textit{Cov}(X_n, X_1) + a_na_2\textit{Cov}(X_n, X_2) + \ldots + a_na_{n-1}\textit{Cov}(X_n, X_{n-1}) \end{aligned} \]

Next time…

Use these properties to derive mean and variance for sampling distributions.