Sign test
Data Setting
Population: \(Y \sim\) something with c.d.f \(F_Y(y) = P(Y \le y)\)
Parameter: \(M = F^{-1}_Y(0.5)\), the population median
Sample: \(n\) i.i.d from population: \(Y_1, \ldots, Y_n\)
Null hypothesis: \(H_0: M = M_0\)
Your Turn:
Consider the hypothesis \(H_0: M = M_0\)
Imagine transforming the \(Y_i, \, i = 1, \ldots, n\) to \[ X_i = \begin{cases} 1, & Y_i \le M_0 \\ 0, & Y_i > M_0 \end{cases} \]
If the null hypothesis is true \(M = M_0\), what is \(P(X_i = 1)\)?
(You can assume \(Y\) is a continuous distribution)
Sign test
To test \(H_0: M = M_0\), perform a Binomial test on \(X_i = \pmb{1}\{Y_i \le M_0 \}\) with \(H_0: p = 0.5\).
Example
Consider a sample, \(n = 12\), with the sample values:
Consider testing \(H_0: M = 4\) versus a two-sided alternative \(H_A: M \ne 4\) (at the \(\alpha = 0.05\) level).
\(X_i = \pmb{1}\{Y_i \le 4 \}\)
\(\hat{p}_{M_0} = \frac{1}{n}\sum_{i = 1}^{n}X_i = 0.25\)
\(Z(p_0 = 0.5) = \frac{\hat{p}_{M_0} - p_0}{\sqrt{p_0(1-p_0)/n}} = -1.73\)
Compare to \(z_{1-\alpha/2} = 1.96\)
We fail to reject the null hypothesis.
Your turn: 95% Confidence Interval
We can invert the test by considering all \(M_0\) for which we would fail to reject the null hypothesis \(H_0: M = M_0\).
Would you reject for the value on your slip of paper?
Why do we only need to consider the actual sample values?
Your turn: 95% Confidence Interval
## m_0
## 1 0.8
## 2 2.1
## 3 2.8
## 4 4.3
## 5 5.3
## 6 6.1
## 7 7.2
## 8 8.2
## 9 9.3
## 10 10.1
## 11 10.9
## 12 12.1
95% confidence interval for \(M\) is \((\qquad, \qquad)\)
Confidence interval in general
Solve for \(M_0\) that satisfy (i.e. not in rejection region)
\[ \left| \frac{X/n - 0.5}{0.5 \sqrt{n}} \right| < z_{1-\alpha/2}, \quad \text{where } X = \text{number of observations } \le M_0 \]
\[ -z_{1-\alpha/2} < \frac{X/n - 0.5}{0.5 / \sqrt{n}} < z_{1-\alpha/2} \]
\[ \begin{aligned} n(0.5 -z_{1-\alpha/2}\frac{0.5}{\sqrt{n}}) &< X < n(0.5 + z_{1-\alpha/2}\frac{0.5}{\sqrt{n}}) \\ \frac{1}{2}(n -z_{1-\alpha/2}\sqrt{n}) &< X < \frac{1}{2}(n + z_{1-\alpha/2}\sqrt{n}) \end{aligned} \]
Confidence interval in general
So, \(M_0\) is in interval if the number of observations smaller than \(M_0\) is between: \[ \frac{1}{2}(n -z_{1-\alpha/2}\sqrt{n}) \quad \text{ and } \quad \frac{1}{2}(n + z_{1-\alpha/2}\sqrt{n}) \]
The smallest value that satisifies this is the \[ \left(\frac{1}{2}(n -z_{1-\alpha/2}\sqrt{n})\right)^{\text{th}} \text{ smallest observation} \]
The largest value that satisifies this is the \[ \left(\frac{1}{2}(n + z_{1-\alpha/2}\sqrt{n}) + 1\right)^{\text{th}} \text{ smallest observation} \]
Confidence interval for median
Approximate (based on approximate Binomial test) confidence interval for the median:
\[ \begin{aligned} \Biggl( \left(\frac{n - z_{1-\alpha/2}\sqrt{n}}{2} \right)^{\text{th}} \text{ smallest observation}, \\ \left(\frac{n + z_{1-\alpha/2}\sqrt{n}}{2} + 1 \right)^{\text{th}} \text{ smallest observation}\Biggr) \end{aligned} \]
May need to round \((.)^\text{th}\) to nearest integers
Example, continued
\(n = 12, \alpha = 0.05 \implies\)
\[ \begin{aligned} &\Biggl(\left(\frac{12 - 1.96\sqrt{12}}{2} \right)^{\text{th}} \text{ smallest observation}, \\ &\qquad \left(\frac{12 +1.96\sqrt{12}}{2} + 1 \right)^{\text{th}} \text{ smallest observation} \Biggr) \\ &\left(\left(2.61\right)^{\text{th}} \text{ smallest observation}, \left(10.40\right)^{\text{th}} \text{ smallest observation} \right)\\ &\left(3^{\text{rd}} \text{ smallest observation}, 10^{\text{th}} \text{ smallest observation} \right) \\ &\left(2.8, 10.1 \right) \end{aligned} \]
Sign test for discrete distributions/data
- Remove all values exactly equal to \(M_0\)
- Proceed with test as usual (with a reduced sample size \(n\))
Sign test: exactness
Finite sample exact? No
- Discrete nature of data means we can’t achieve a lot of signficance levels
- Normal approximation is only an approximation…
Assymptotically exact? Yes
Sign test: consistency
The sign test test is consistent. Comes from Binomial test being consistent (which comes from Z-test being consistent).
Next time…
Signed Rank test