Stanford University, Spring 2016, STATS 205

Location Model

  • \(X_1,X_2,\dots,X_n\) denotes random sample following the model \[ X_i = \theta + e_i, \] random errors \(e_i\) are independently and identically distributed with a continuous probability density function \(f(t)\) symmetric around \(0\).

  • Hypothesis testing: \[ H_0: \theta = 0 \text{ versus } H_A: \theta > 0 \]

  • A null hypothesis is associated with a contradiction to a theory one would like to prove
  • An alternative hypothesis is associated with the actual theory one would like to prove

Sign Test

  • To every statistical test such as the sign test, we associate a test statistic
  • A test statistic is a function of the data

  • The sign test has test statistic: \[ S = \sum_{i=1}^n \operatorname{sign}(X_i) \] with \(\operatorname{sign}(t) = \begin{cases} -1 & \text{if } t < 0 \\ 0 & \text{if } t = 0 \\ 1 & \text{if } t > 0 \end{cases}\)

Sign Test

  • Looking just at the positive observations: \[ S^+ = \#_i \{ X_i > 0 \} \]
  • \(0\) observations are ignored and sample size is reduced
  • If observing a \(-1\) or \(1\) for each \(X_i\) is a coin flip then
  • \(S^+\) follows a binomial distribution with \(n\) trials and success probability \(1/2\)
  • This is the null distribution of the test statistic
  • It doesn't depend on the error distribution
  • This property is called distribution-free
  • Compare null distribution with observed test statistic

Sign Test Example

  • Comparison of ice cream brands A and B
  • Theory: Brand A is preferred over B
  • Null hypothsis: No difference between A and B
  • Alternative: A is preferred over B

  • Blindfolded taster gives preference of one ice cream over the other
  • Experiment with 12 taster
    • 10 tasters prefer brand A over brand B
    • 1 taster prefers brand B over A
    • 1 no preference
  • Pretty convincing evidence in favor of brand A
  • But how likely is such a result due to chance if the null hypothesis is true (no preference in brands)?

Sign Test Example

  • For our sign test, the test statistic \(S^+\) is the number positive signs
  • Evaluated on our data the observed test statistic is \(s^+ = 10\)
  • We compare the observed test statistic with the binomial distribution with \(n = 11\) trials and success probabilty \(1/2\)

Sign Test Example

The \(p\)-value is \(P_{H_0}(S^+ \ge s^+) = 1 − F_B(s^+ − 1,n,\frac{1}{2})\)

## [1] 0.005859375

Sign Test Example

Significance level: the probabilty of rejecting the null when it is true

We can reject the null hypothesis at significance level \(\alpha = 0.05\) in favor of the alternative that Brand A is tastier than Brand B.

Traditional \(t\)-Test

  • The \(t\)-test depends on the population pdf of \(f(t)\)
  • Thus it is not distribution free
  • The usual test statistic is \[ t = \frac{\bar{X}}{s/\sqrt{n}} \] here \(\bar{X}\) is the mean and \(s\) is the standard deviation of the sample
  • Usually \(s\) is not known and has to be estimated, then if the population is normal \(t\) has a Student \(t\)-distribution with \(n-1\) degress of freedom
  • And the \(p\)-value \(P_{H_0}(t \ge t_0) = 1 - F_T(t_0,n-1)\)

Signed-Rank Wilcoxon Test

  • In contrast to the \(t\)-test, the sign-rank Wilcoxon test uses only the ranks of the distance from \(0\) \[ W = \sum_{i>0}^n \operatorname{sign}(X_i) \operatorname{R} |X_i| \]
  • No closed-form, iterative algorithms, e.g. psignrank in R
  • Our test statistics considers positive items only \[ W^+ = \sum_{X_i > 0} \operatorname{R} |X_i| = \frac{1}{2} W + \frac{n(n+1)}{4} \]
  • The distribution is symmetric around \(\frac{n(n+1)}{4}\)

Signed-Rank Wilcoxon Test

The distribution of test statistic cannot be obtained in closed-form.

Enumerate all possible outcomes for sample size three \(n = 3\):

Signed-Rank Wilcoxon Test

The distribution of test statistic cannot be obtained in closed-form.

Enumerate all \(2^n\) possible outcomes for sample size three \(n = 3\):

##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0         NaN
## 2      0      0      1         NaN
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

SumRankPlus = apply(sign*rank,1,sum)
outcome$SumRankPlus[1] = SumRankPlus[1]; outcome
##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1         NaN
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

outcome$SumRankPlus[2] = SumRankPlus[2]
outcome
##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1           3
## 3      0      1      0         NaN
## 4      0      1      1         NaN
## 5      1      0      0         NaN
## 6      1      0      1         NaN
## 7      1      1      0         NaN
## 8      1      1      1         NaN

Signed-Rank Wilcoxon Test

outcome$SumRankPlus = SumRankPlus; outcome
##   sign.1 sign.2 sign.3 SumRankPlus
## 1      0      0      0           0
## 2      0      0      1           3
## 3      0      1      0           2
## 4      0      1      1           5
## 5      1      0      0           1
## 6      1      0      1           4
## 7      1      1      0           3
## 8      1      1      1           6
table(outcome$SumRankPlus)
## 
## 0 1 2 3 4 5 6 
## 1 1 1 2 1 1 1

Signed-Rank Wilcoxon Test

Or through Monte Carlo simulations:

n = 3; nsim = 10000
X = matrix(rnorm(n*nsim),ncol=n)
WplusSim = apply(X,1, function(x) { sum( rank(abs(x)) * (x>0) ) })
table(WplusSim)/nsim
## WplusSim
##      0      1      2      3      4      5      6 
## 0.1231 0.1235 0.1223 0.2535 0.1225 0.1270 0.1281

and compare with theoretical result:

##     0     1     2     3     4     5     6 
## 0.125 0.125 0.125 0.250 0.125 0.125 0.125

Signed-Rank Wilcoxon Test Example

  • Comparison of social awareness in kids going to school versus home schooling
  • Theory: Sending kids to school improves social awareness
  • Null hypothsis: No difference between kids going to school and staying at home
  • Alternative: Kids going to school have improved social awareness

  • Eight pairs of identical twins who are of nursery school age.
  • For each pair, one is randomly selected to attend nursery school while the other remains at home.
  • At the end of the study period, all 16 children are given the same social awareness test.

Signed-Rank Wilcoxon Test Example

The statistic is the sum of the ranks of positive items:

school = c(82,69,73,43,58,56,76,65); home = c(63,42,74,37,51,43,80,62)
response = school - home
response
## [1] 19 27 -1  6  7 13 -4  3
rank(abs(response))
## [1] 7 8 1 4 5 6 3 2

Signed-Rank Wilcoxon Test Example

Calculate ranks of absolute values:

response
## [1] 19 27 -1  6  7 13 -4  3
rank(abs(response))
## [1] 7 8 1 4 5 6 3 2

The statistic is the sum of the ranks of positive items:

## [1] 32

Signed-Rank Wilcoxon Test Example

Signed-Rank Wilcoxon Test Example

and using the cumulative distribution function \(F_{W^+}(w^+-1,n)\) of our test statistic:

Signed-Rank Wilcoxon Test Example

We can find the \(p\)-value using \(P_{H_0}(W^+ \ge w^+) = 1 - F_{W^+}(w^+ - 1,n)\):

pvalue = 1-psignrank(wplus-1,n,lower.tail = TRUE)
pvalue
## [1] 0.02734375

Signed-Rank Wilcoxon Test Example

Significance level: the probabilty of rejecting the null when it is true

We can reject the null hypothesis at significance level \(\alpha = 0.05\) in favor of the alternative that sending kids to school improves their social awareness.

Estimation and Confidence Intervals

  • All three tests (sign test, signed-rank Wilcoxon, and \(t\)-test) have an associated estimate and confidence interval for the location parameter \(\theta\)
  • Recall the model for our sample \(X_1,X_2,\dots,X_n\) \[ X_i = \theta_i + e_i \]

  • Some definitions:
  • order statistics: \(X_{(1)}\) minimum observation, \(X_{(n)}\) largest observation
  • in increasing order \(X_{(1)} < X_{(2)} < \dots < X_{(n)}\)
  • quantiles: equally spaced splitting points of continuous intervals with equal probabalities or sample

Estimation and Confidence Intervals of Sign Test

Estimator is the median: \[ \hat{\theta} = \operatorname{median}\{ X_1,X_2,\dots,X_n \} \]

Confidence interval \((1−\alpha)100\%\): \[ \left(X_{(c_1+1)},X_{(n−c_1)} \right), \] where \(c_1\) is the \(\frac{\alpha}{2}\) quantile of the binomial distribution

Estimation and Confidence Intervals of Sign-Rank Wilcoxon

Hodges–Lehmann estimator: \[ \hat{\theta} = \operatorname{median}_{i\le j} \left\{ \frac{X_i+X_j}{2} \right\} \]

\(A_{ij} = \frac{(X_i +X_j)}{2}, i \le j\) are called the Walsh averages

and used to form confidence interval \((1−\alpha)100\%\): \[ \left(A_{(c_2 +1)},A_{([n(n+1)/2]−c_2 )} \right), \] \(c_2\) is the \(\frac{\alpha}{2}\) quantile of the signed-rank Wilcoxon distribution

How Are Walsh Averages and Wilcoxon Related?

How Are Walsh Averages and Wilcoxon Related?

Some Comparisons

  • Power of a statistical test:
    the probability of rejection the null hypothesis when it is false
  • The power of the sign test can be low relative to \(t\)-test
  • The power of signed-rank Wilcoxon test is nearly that of the \(t\)-test for normal distributions and generally greater than that of the \(t\)-test for distributions with heavier tails than the normal distribution

Power Simulation \(\theta = 0\)

n = 30; df = 2; nsims = 10000; mu = 0; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil
## [1] 0.0477
powert
## [1] 0.0365

Power Simulation \(\theta = 0.5\)

n = 30; df = 2; nsims = 10000; mu = 0.5; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil
## [1] 0.4655
powert
## [1] 0.2914

Power Simulation \(\theta = 1\)

n = 30; df = 2; nsims = 10000; mu = 1; collwil = rep(0,nsims); collt = rep(0,nsims)
for(i in 1:nsims) {
  x = rt(n,df) + mu
  wil = wilcox.test(x) 
  collwil[i] = wil$p.value 
  ttest = t.test(x) 
  collt[i] = ttest$p.value
}
powwil = rep(0,nsims); powwil[collwil <= .05] = 1; powerwil = sum(powwil)/nsims
powt = rep(0,nsims); powt[collt <= .05] = 1; powert = sum(powt)/nsims
powerwil
## [1] 0.9245
powert
## [1] 0.698

Summary

Assumptions on \(f(t)\):

  • Sign Test: any continuous distribution
  • Signed-Rank Test: any symmetric continuous distribution
  • \(t\)-test: any normal distribution

  • The continuity assumption assures that ties are impossible: With probability one we have \(X_i \ne X_j\) when \(i \ne j\)
  • The continuity assumption is only necessary for exact hypothesis tests not for estimates and confidence intervals

Assumptions on \(e_i,e_2,\dots,e_n\) and thus \(X_1,X_2,\dots,X_n\):

  • independent and identically distributed