Wilcoxon Signed Rank Test

Usual setting

Population: $Y \sim$ some population distribution

Sample: $n$ i.i.d from population: $Y_{1}, \dots, Y_{n}$

Parameter: ?

Null Hypothesis the population ‘center’ is $c_{0}$ .

Let’s talk about the procedure first, then come back to why it’s hard to be specific here.

Wilcoxon Signed Rank Test Procedure

Find the distance of each observed value from the hypothesized center, $c_{0}$ .
Assign a rank to each observation based on its distance from $c_{0}$ : from 1 for closest, to $n$ for furthest from $c_{0}$ .
Test statistic: $S =$ Sum of the ranks for the values that were larger than $c_{0}$ .

Example: test statistic calculation

$H_{0} : c = 4$

Find distance to $c_{0} = 4$ .

Assign ranks
Test statistic: $S =$ sum of ranks for $Y_{i} > 4 =$

Reference distribution

Either:

Use an exact p-value, by assuming each rank has the same chance of being assigned above or below $c_{0}$ , or
Use the Normal approximation to the null distribution of $S$

Reference distribution: Exact p-values

If the population distribution were symmetric about $c_{0}$ , each rank $1, \dots, n$ independently has probability 0.5 of being assigned to an observation above $c_{0}$ .

We can consider all possible ways of assigning the ranks $1, \dots, n$ above and below $c_{0}$ to work out the exact reference distribution (this is what the R function wilcox.test() does if you use the argument exact = TRUE)

Reference distribution: Normal approximation p-values

If the population distribution were symmetric about $c_{0}$ ,

$E (S) = \frac{n (n + 1)}{4}, V a r (S) = \frac{n (n + 1) (2 n + 1)}{24}$

(Can prove by considering $S$ as a sum of products between Bernoulli(0.5) r.v’s and the integers $1, \dots, n$ )

So, we can construct a Z-statistic

$Z = \frac{S - \frac{n (n + 1)}{4}}{\sqrt{\frac{n (n + 1) (2 n + 1)}{24}}}$

and compare it to a N(0, 1)

Example: continued

$E (S) = \frac{n (n + 1)}{4} = \frac{12 (13)}{4} = 39$ $V a r (S) = \frac{n (n + 1) (2 n + 1)}{24} = \frac{12 (13) (25)}{24} = 162.5$

$Z = \frac{66 - 39}{\sqrt{162.5}} = 2.12$

2 * (1 - pnorm(abs(z)))

## [1] 0.03417047

Why is it hard to say what it tests?

Your turn: Sketch worksheet

Performance of the Wilcoxon Signed Rank Test

With no additional assumptions

As a test of the population mean:

The Wilcoxon Signed Rank test is not assymptotically exact
The Wilcoxon Signed Rank test is not consistent

As a test of the population median:

The Wilcoxon Signed Rank test is not assymptotically exact
The Wilcoxon Signed Rank test is not consistent

Performance of the Wilcoxon Signed Rank Test

If you add an assumption: the population distribution is symmetric.

The Wilcoxon Signed Rank test is assymptotically exact

The Wilcoxon Signed Rank test is consistent

Null hypothesis: $μ = M = c_{0}$

We learn about the mean/median. Of course we could learn more about these parameters directly with a t-test or sign test without the additional symmetry assumption.

Performance of the Wilcoxon Signed Rank Test

Often presented as:

“The nonparametric Wilcoxon signed rank test compares the median of a single column of numbers against a hypothetical median.” Incorrect, without symmetry assumption, and then it’s equally a test of the mean.

“This is another test that is a non-parametric equivalent of a 1-Sample t-test”. Incorrect, without symmetry assumption.

“The Wilcoxon signed-rank test applies to the case of symmetric continuous distributions. Under this assumption, the mean equals the median. The null hypothesis is $H_{0} : μ = μ_{0}$ ” Correct

In R

y <- c(0.8, 2.1, 2.8, 4.3, 5.3, 6.1, 7.3, 8.2, 
  9.3, 10.1, 10.9, 12.1)

Exact p-values with exact = TRUE (default)

wilcox.test(y, mu = 4, exact = TRUE)

## 
##  Wilcoxon signed rank test
## 
## data:  y
## V = 66, p-value = 0.03418
## alternative hypothesis: true location is not equal to 4

In R

y <- c(0.8, 2.1, 2.8, 4.3, 5.3, 6.1, 7.3, 8.2, 
  9.3, 10.1, 10.9, 12.1)

Approximate p-values with exact = FALSE and no continuity correction

wilcox.test(y, mu = 4, exact = FALSE, correct = FALSE)

## 
##  Wilcoxon signed rank test
## 
## data:  y
## V = 66, p-value = 0.03417
## alternative hypothesis: true location is not equal to 4

Sign-Rank Test ST551 Lecture 14

Wilcoxon Signed Rank Test

Usual setting

Wilcoxon Signed Rank Test Procedure

Example: test statistic calculation

Reference distribution

Reference distribution: Exact p-values

Reference distribution: Normal approximation p-values

Example: continued

Why is it hard to say what it tests?

Performance of the Wilcoxon Signed Rank Test

Performance of the Wilcoxon Signed Rank Test

Performance of the Wilcoxon Signed Rank Test

In R

In R