Sign Test and Signed-Rank Wilcoxon

Stanford University, Spring 2016, STATS 205

Location Model

\(X_1,X_2,\dots,X_n\) denotes random sample following the model \[ X_i = \theta + e_i, \] random errors \(e_i\) are independently and identically distributed with a continuous probability density function \(f(t)\) symmetric around \(0\).
Hypothesis testing: \[ H_0: \theta = 0 \text{ versus } H_A: \theta > 0 \]
A null hypothesis is associated with a contradiction to a theory one would like to prove
An alternative hypothesis is associated with the actual theory one would like to prove

Sign Test

To every statistical test such as the sign test, we associate a test statistic
A test statistic is a function of the data
The sign test has test statistic: \[ S = \sum_{i=1}^n \operatorname{sign}(X_i) \] with \(\operatorname{sign}(t) = \begin{cases} -1 & \text{if } t < 0 \\ 0 & \text{if } t = 0 \\ 1 & \text{if } t > 0 \end{cases}\)

Sign Test

Looking just at the positive observations: \[ S^+ = \#_i \{ X_i > 0 \} \]
\(0\) observations are ignored and sample size is reduced
If observing a \(-1\) or \(1\) for each \(X_i\) is a coin flip then
\(S^+\) follows a binomial distribution with \(n\) trials and success probability \(1/2\)
This is the null distribution of the test statistic
It doesn't depend on the error distribution
This property is called distribution-free
Compare null distribution with observed test statistic

Sign Test Example

Comparison of ice cream brands A and B
Theory: Brand A is preferred over B
Null hypothsis: No difference between A and B
Alternative: A is preferred over B
Blindfolded taster gives preference of one ice cream over the other
Experiment with 12 taster
- 10 tasters prefer brand A over brand B
- 1 taster prefers brand B over A
- 1 no preference
Pretty convincing evidence in favor of brand A
But how likely is such a result due to chance if the null hypothesis is true (no preference in brands)?

Sign Test Example

For our sign test, the test statistic \(S^+\) is the number positive signs
Evaluated on our data the observed test statistic is \(s^+ = 10\)
We compare the observed test statistic with the binomial distribution with \(n = 11\) trials and success probabilty \(1/2\)

Sign Test Example

The \(p\)-value is \(P_{H_0}(S^+ \ge s^+) = 1 − F_B(s^+ − 1,n,\frac{1}{2})\)

## [1] 0.005859375

Sign Test Example

Significance level: the probabilty of rejecting the null when it is true

We can reject the null hypothesis at significance level \(\alpha = 0.05\) in favor of the alternative that Brand A is tastier than Brand B.

Traditional \(t\)-Test

The \(t\)-test depends on the population pdf of \(f(t)\)
Thus it is not distribution free
The usual test statistic is \[ t = \frac{\bar{X}}{s/\sqrt{n}} \] here \(\bar{X}\) is the mean and \(s\) is the standard deviation of the sample
Usually \(s\) is not known and has to be estimated, then if the population is normal \(t\) has a Student \(t\)-distribution with \(n-1\) degress of freedom
And the \(p\)-value \(P_{H_0}(t \ge t_0) = 1 - F_T(t_0,n-1)\)

Signed-Rank Wilcoxon Test

In contrast to the \(t\)-test, the sign-rank Wilcoxon test uses only the ranks of the distance from \(0\) \[ W = \sum_{i>0}^n \operatorname{sign}(X_i) \operatorname{R} |X_i| \]
No closed-form, iterative algorithms, e.g. psignrank in R
Our test statistics considers positive items only \[ W^+ = \sum_{X_i > 0} \operatorname{R} |X_i| = \frac{1}{2} W + \frac{n(n+1)}{4} \]
The distribution is symmetric around \(\frac{n(n+1)}{4}\)

Signed-Rank Wilcoxon Test

The distribution of test statistic cannot be obtained in closed-form.

Enumerate all possible outcomes for sample size three \(n = 3\):

Signed-Rank Wilcoxon Test

The distribution of test statistic cannot be obtained in closed-form.

Enumerate all \(2^n\) possible outcomes for sample size three \(n = 3\):

##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0         NaN
## 2      0      0      1         NaN
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

SumRankPlus = apply(sign*rank,1,sum)
outcome$SumRankPlus[1] = SumRankPlus[1]; outcome

##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1         NaN
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

outcome$SumRankPlus[2] = SumRankPlus[2]
outcome

##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1           3
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

outcome$SumRankPlus = SumRankPlus; outcome

##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1           3
## 3      0      1      0           2
## 4      0      1      1           5
## 5      1      0      0           1
## 6      1      0      1           4
## 7      1      1      0           3
## 8      1      1      1           6

table(outcome$SumRankPlus)

## 
## 0 1 2 3 4 5 6 
## 1 1 1 2 1 1 1

Signed-Rank Wilcoxon Test

Or through Monte Carlo simulations:

n = 3; nsim = 10000
X = matrix(rnorm(n*nsim),ncol=n)
WplusSim = apply(X,1, function(x) { sum( rank(abs(x)) * (x>0) ) })
table(WplusSim)/nsim

## WplusSim
##      0      1      2      3      4      5      6 
## 0.1231 0.1235 0.1223 0.2535 0.1225 0.1270 0.1281

and compare with theoretical result:

##     0     1     2     3     4     5     6 
## 0.125 0.125 0.125 0.250 0.125 0.125 0.125

Signed-Rank Wilcoxon Test Example

Comparison of social awareness in kids going to school versus home schooling
Theory: Sending kids to school improves social awareness
Null hypothsis: No difference between kids going to school and staying at home
Alternative: Kids going to school have improved social awareness
Eight pairs of identical twins who are of nursery school age.
For each pair, one is randomly selected to attend nursery school while the other remains at home.
At the end of the study period, all 16 children are given the same social awareness test.

Signed-Rank Wilcoxon Test Example

The statistic is the sum of the ranks of positive items:

school = c(82,69,73,43,58,56,76,65); home = c(63,42,74,37,51,43,80,62)
response = school - home
response

## [1] 19 27 -1  6  7 13 -4  3

rank(abs(response))

## [1] 7 8 1 4 5 6 3 2

Signed-Rank Wilcoxon Test Example

Calculate ranks of absolute values:

response

## [1] 19 27 -1  6  7 13 -4  3

rank(abs(response))

## [1] 7 8 1 4 5 6 3 2

The statistic is the sum of the ranks of positive items:

## [1] 32

Signed-Rank Wilcoxon Test Example

and using the cumulative distribution function \(F_{W^+}(w^+-1,n)\) of our test statistic:

Signed-Rank Wilcoxon Test Example

We can find the \(p\)-value using \(P_{H_0}(W^+ \ge w^+) = 1 - F_{W^+}(w^+ - 1,n)\):

pvalue = 1-psignrank(wplus-1,n,lower.tail = TRUE)
pvalue

## [1] 0.02734375

Signed-Rank Wilcoxon Test Example

Significance level: the probabilty of rejecting the null when it is true

We can reject the null hypothesis at significance level \(\alpha = 0.05\) in favor of the alternative that sending kids to school improves their social awareness.

Estimation and Confidence Intervals

All three tests (sign test, signed-rank Wilcoxon, and \(t\)-test) have an associated estimate and confidence interval for the location parameter \(\theta\)
Recall the model for our sample \(X_1,X_2,\dots,X_n\) \[ X_i = \theta_i + e_i \]
Some definitions:
order statistics: \(X_{(1)}\) minimum observation, \(X_{(n)}\) largest observation
in increasing order \(X_{(1)} < X_{(2)} < \dots < X_{(n)}\)
quantiles: equally spaced splitting points of continuous intervals with equal probabalities or sample

Estimation and Confidence Intervals of Sign Test

Estimator is the median: \[ \hat{\theta} = \operatorname{median}\{ X_1,X_2,\dots,X_n \} \]

Confidence interval \((1−\alpha)100\%\): \[ \left(X_{(c_1+1)},X_{(n−c_1)} \right), \] where \(c_1\) is the \(\frac{\alpha}{2}\) quantile of the binomial distribution

Estimation and Confidence Intervals of Sign-Rank Wilcoxon

Hodges–Lehmann estimator: \[ \hat{\theta} = \operatorname{median}_{i\le j} \left\{ \frac{X_i+X_j}{2} \right\} \]

\(A_{ij} = \frac{(X_i +X_j)}{2}, i \le j\) are called the Walsh averages

and used to form confidence interval \((1−\alpha)100\%\): \[ \left(A_{(c_2 +1)},A_{([n(n+1)/2]−c_2 )} \right), \] \(c_2\) is the \(\frac{\alpha}{2}\) quantile of the signed-rank Wilcoxon distribution

How Are Walsh Averages and Wilcoxon Related?

Some Comparisons

Power of a statistical test:
the probability of rejection the null hypothesis when it is false
The power of the sign test can be low relative to \(t\)-test
The power of signed-rank Wilcoxon test is nearly that of the \(t\)-test for normal distributions and generally greater than that of the \(t\)-test for distributions with heavier tails than the normal distribution

Power Simulation \(\theta = 0\)

n = 30; df = 2; nsims = 10000; mu = 0; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil

## [1] 0.0477

powert

## [1] 0.0365

Power Simulation \(\theta = 0.5\)

n = 30; df = 2; nsims = 10000; mu = 0.5; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil

## [1] 0.4655

powert

## [1] 0.2914

Power Simulation \(\theta = 1\)

n = 30; df = 2; nsims = 10000; mu = 1; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil

## [1] 0.9245

powert

## [1] 0.698

Summary

Assumptions on \(f(t)\):

Sign Test: any continuous distribution
Signed-Rank Test: any symmetric continuous distribution
\(t\)-test: any normal distribution
The continuity assumption assures that ties are impossible: With probability one we have \(X_i \ne X_j\) when \(i \ne j\)
The continuity assumption is only necessary for exact hypothesis tests not for estimates and confidence intervals

Assumptions on \(e_i,e_2,\dots,e_n\) and thus \(X_1,X_2,\dots,X_n\):

independent and identically distributed