Wilcoxon Signed Rank Test
Usual setting
Population: \(Y \sim\) some population distribution
Sample: \(n\) i.i.d from population: \(Y_1, \ldots, Y_n\)
Parameter: ?
Null Hypothesis the population ‘center’ is \(c_0\).
Let’s talk about the procedure first, then come back to why it’s hard to be specific here.
Wilcoxon Signed Rank Test Procedure
- Find the distance of each observed value from the hypothesized center, \(c_0\).
- Assign a rank to each observation based on its distance from \(c_0\): from 1 for closest, to \(n\) for furthest from \(c_0\).
- Test statistic: \(S =\) Sum of the ranks for the values that were larger than \(c_0\).
Example: test statistic calculation
\(H_0: c = 4\)
- Find distance to \(c_0 = 4\).
Assign ranks
Test statistic: \(S =\) sum of ranks for \(Y_i > 4 =\)
Reference distribution
Either:
- Use an exact p-value, by assuming each rank has the same chance of being assigned above or below \(c_0\), or
- Use the Normal approximation to the null distribution of \(S\)
Reference distribution: Exact p-values
If the population distribution were symmetric about \(c_0\), each rank \(1,\ldots, n\) independently has probability 0.5 of being assigned to an observation above \(c_0\).
We can consider all possible ways of assigning the ranks \(1,\ldots, n\) above and below \(c_0\) to work out the exact reference distribution (this is what the R function wilcox.test()
does if you use the argument exact = TRUE
)
Reference distribution: Normal approximation p-values
If the population distribution were symmetric about \(c_0\),
\[ E(S) = \frac{n(n+1)}{4}, \quad Var(S) = \frac{n(n+1)(2n+1)}{24} \]
(Can prove by considering \(S\) as a sum of products between Bernoulli(0.5) r.v’s and the integers \(1, \ldots, n\))
So, we can construct a Z-statistic
\[ Z = \frac{S - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}} \]
and compare it to a N(0, 1)
Example: continued
\[ E(S) = \frac{n(n+1)}{4} = \frac{12(13)}{4} = 39 \] \[ Var(S) = \frac{n(n+1)(2n+1)}{24} = \frac{12(13)(25)}{24} = 162.5 \]
\[ Z = \frac{66 - 39}{\sqrt{162.5}} = 2.12 \]
2 * (1 - pnorm(abs(z)))
## [1] 0.03417047
Why is it hard to say what it tests?
Your turn: Sketch worksheet
Performance of the Wilcoxon Signed Rank Test
With no additional assumptions
As a test of the population mean:
- The Wilcoxon Signed Rank test is not assymptotically exact
- The Wilcoxon Signed Rank test is not consistent
As a test of the population median:
- The Wilcoxon Signed Rank test is not assymptotically exact
- The Wilcoxon Signed Rank test is not consistent
Performance of the Wilcoxon Signed Rank Test
If you add an assumption: the population distribution is symmetric.
The Wilcoxon Signed Rank test is assymptotically exact
The Wilcoxon Signed Rank test is consistent
Null hypothesis: \(\mu = M = c_0\)
We learn about the mean/median. Of course we could learn more about these parameters directly with a t-test or sign test without the additional symmetry assumption.
Performance of the Wilcoxon Signed Rank Test
Often presented as:
“The nonparametric Wilcoxon signed rank test compares the median of a single column of numbers against a hypothetical median.” Incorrect, without symmetry assumption, and then it’s equally a test of the mean.
“This is another test that is a non-parametric equivalent of a 1-Sample t-test”. Incorrect, without symmetry assumption.
“The Wilcoxon signed-rank test applies to the case of symmetric continuous distributions. Under this assumption, the mean equals the median. The null hypothesis is \(H_0: \mu = \mu_0\)” Correct
In R
y <- c(0.8, 2.1, 2.8, 4.3, 5.3, 6.1, 7.3, 8.2,
9.3, 10.1, 10.9, 12.1)
Exact p-values with exact = TRUE
(default)
wilcox.test(y, mu = 4, exact = TRUE)
##
## Wilcoxon signed rank test
##
## data: y
## V = 66, p-value = 0.03418
## alternative hypothesis: true location is not equal to 4
In R
y <- c(0.8, 2.1, 2.8, 4.3, 5.3, 6.1, 7.3, 8.2,
9.3, 10.1, 10.9, 12.1)
Approximate p-values with exact = FALSE
and no continuity correction
wilcox.test(y, mu = 4, exact = FALSE, correct = FALSE)
##
## Wilcoxon signed rank test
##
## data: y
## V = 66, p-value = 0.03417
## alternative hypothesis: true location is not equal to 4