Stanford University, Spring 2016, STATS 205

The Lady and the Tea

  • Fisher starts his second chapter in his famous book on "The Design of Experiments" first published in 1935:
  • Quote: A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup.
  • Fisher proposed the the following experiment to test this
  • Prepare eight cups of tea
  • In four, add tea first
  • In the other four, add milk first
  • Hold constant other features of the cups of tea (size, temperature, etc.)
  • Present cups in random order
  • Lady tasts them and judges
  • She knows that there are four of each type

The Lady and the Tea

  • Our theory is that the lady can taste the difference
  • Which translates into our null hypothesis:
    \(H_0:\) lady cannot taste difference
  • The test statistics is the number of correct judgements
  • Under the null all judgments are equally likely
  • What is the null distribution of correct judgments?
  • Once we have that compare it to the experiment and compute the probability of having observed the experimantal outcome or a more extreme case

The Lady and the Tea

  • Because of randomization, all \(8! = 40320\) permutations of the cups are equally likely, and each one has its own number of correct judgements
  • There are \({8 \choose 4} = 70\) ways of choosing 4 cups from 8 cups
  • Think of it as, choosing 4 cups in succession by first choosing from 8 cups, then 7, 6, 5, which gives \(8 \times 7 \times 6 \times 5 = 1680\)
  • But by doing so, we have chosen not only every possible set of 4, but every possible set in every possible order
  • Since 4 cups can be rearranged in order in \(4 \times 3 \times 2 \times 1 = 24\), we correct by \(1680/24 = 70\)

The Lady and the Tea

##           [,1]  [,2]   [,3]   [,4]   [,5]  [,6]  [,7]   [,8]  
## truth     "tea" "milk" "milk" "milk" "tea" "tea" "tea"  "milk"
## judgement "tea" "milk" "milk" "milk" "tea" "tea" "milk" "tea"
  • Under the null, the reasons for the lady's judgements are unkown, except that they have nothing to do with the truth
  • The judgements are what they are; they are fixed
  • All \(70\) ways to line up cups are equally likely
  • But only one line up will match the lady's judgements

The Lady and the Tea

Of the four cups where the tea was poured first, select \(t\) of them to say "tea" correctly, and \(4−t\) to say "milk" incorrectly

t = 0:4
probability = (choose(4,t)*choose(4,4-t))/choose(8,4)
data.frame(t=t,probability=round(probability,digits=3))
##   t probability
## 1 0       0.014
## 2 1       0.229
## 3 2       0.514
## 4 3       0.229
## 5 4       0.014
  • The chances of a match is \(\frac{1}{70} = 0.014 < 0.05\)
  • We need perfect match!

The Lady and the Tea

Monte Carlo simulations

judgement = c("tea","milk","milk","milk","tea","tea","milk","tea")
nSim = 10000
permutations = replicate(nSim,sample(8,replace = FALSE))
matches = apply(permutations,2,function(i) sum(judgement[i] == judgement))
data.frame(t=t,
           probability=round(probability,digits=3),
           monte=round(table(matches)/nSim,digits=3))
##   t probability monte.matches monte.Freq
## 1 0       0.014             0      0.013
## 2 1       0.229             2      0.224
## 3 2       0.514             4      0.518
## 4 3       0.229             6      0.230
## 5 4       0.014             8      0.015

The Lady and the Tea

t = 0:7
probability = (choose(7,t)*choose(7,7-t))/choose(14,7)
data.frame(t=t,probability=round(probability,digits=4))
##   t probability
## 1 0      0.0003
## 2 1      0.0143
## 3 2      0.1285
## 4 3      0.3569
## 5 4      0.3569
## 6 5      0.1285
## 7 6      0.0143
## 8 7      0.0003
  • If she tasted 14 cups, it would be possible to reject \(H_0\) without requiring perfect judgement
  • We need either perfect match or one miss to get a \(p\)-value below \(0.05\)

Darwin and Fisher

  • In chapter 3 of Fisher's book, he then introduced the first nonparametric test
  • He illustrated his approach on data from Darwin
  • Darwin raised cross- and self-fertilized corn (Zea mays) plants
  • He planted equal numbers of each in four different pots, but not same number in each pot
  • Darwin measured heights of plants when they were between 12 and 24 inches in height

Darwin and Fisher

Original data (plant height in inches) from Darwin listed in Fisher's book

##    pair pot  cross   self   diff
## 1     1   1 23.500 17.375  6.125
## 2     2   1 12.000 20.375 -8.375
## 3     3   1 21.000 20.000  1.000
## 4     4   2 22.000 20.000  2.000
## 5     5   2 19.125 18.375  0.750
## 6     6   2 21.500 18.625  2.875
## 7     7   3 22.125 18.625  3.500
## 8     8   3 20.375 15.250  5.125
## 9     9   3 18.250 16.500  1.750
## 10   10   3 21.625 18.000  3.625
## 11   11   3 23.250 16.250  7.000
## 12   12   4 21.000 18.000  3.000
## 13   13   4 22.125 12.750  9.375
## 14   14   4 23.000 15.500  7.500
## 15   15   4 12.000 18.000 -6.000

Darwin and Fisher

Fisher proceeds by testing the hypothesis
\(H_0:\) No difference between crossed and self-fertilized plants

He peforms the paired sample \(t\)-test

## 
##  Paired t-test
## 
## data:  ZeaMays$cross and ZeaMays$self
## t = 2.148, df = 14, p-value = 0.0497
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.003899165 5.229434169
## sample estimates:
## mean of the differences 
##                2.616667

Darwin and Fisher

ZeaMays$diff
##  [1]  6.125 -8.375  1.000  2.000  0.750  2.875  3.500  5.125  1.750  3.625
## [11]  7.000  3.000  9.375  7.500 -6.000
  • Then he invents the first permutation test
  • Under \(H_0\) that self-fertilized versus cross-fertilized does not matter, only chance determined whether \(A-B\) or \(B-A\)
  • So the absolute value of the difference is what it is, but the plus or minus sign is by chance alone
  • Test statistic is sum of the differences
  • There are \(2^{15} = 32768\) ways to swap the plus and minus signs, all equaly likely under \(H_0\)
  • Calculate statistic for each one to get null distribution

Darwin and Fisher

  • Random samples \(X_1,\dots,X_n\) from \(X\)
  • Test \(H_0: E(X) = 0\)
  • Write the usuall statistic as \[T = \sum_{i=1}^n X_i = \sum_{i=1}^n |X_i| \delta_i\]
  • and split samples into two parts \[(|X_1|,\dots,|X_n|) \hspace{1cm}\text{and}\hspace{1cm} (\delta_1,\dots,\delta_n)\]
  • Under \(H_0\) the two parts are independent and \(\delta_i\) is a fair coin flip \(\{-1,1\}\)
  • The distribution of \(T\) is not well defined under \(H_0\)

Darwin and Fisher

  • But if we fix \(|X_i|\) at their observed values and regard \(\delta_i\) as random
  • Then the condition null of \(T\) is well defined
  • The only thing left to do is to compute \[P\left(T \ge t \, \big| \, H_0, |X_1|, \dots, |X_n| \right)\] by listing all possible samples of type \[\left(\pm \, |X_1|, \dots, \pm \, |X_n|\right)\]
  • This is usually called the Fisher's randomization test
  • Permutation tests describe a special case of randomization
  • In permutation tests, we permute group labels on the observations

Darwin and Fisher

Using Monte Carlo simulations

nSim = 10000
n = length(ZeaMays$diff)
absDiff = abs(ZeaMays$diff)
signs = matrix(replicate(nSim*n,sample(c(-1,1),size=1)),nrow = nSim)
TNull = apply(signs,1,function(row) sum(row * absDiff))
T0 = sum(ZeaMays$diff)
pvalue = 2*mean(TNull >= T0); pvalue
## [1] 0.054

Darwin and Fisher

Darwin and Fisher

  • We already looked at similar tests
  • The sign test:
    • Test statistic: \(S = \#_i \{ X_i \}\)
    • Wilcoxon signed-rank test: \(W = \sum_{i=1}^n \operatorname{R}(|X_i|) \delta_i\)
  • In contrast to the randomization test, the Wilcoxon signed-rank test is not conditional on the ranks

Permutation Test – General Recipe

  • Decide on a test statistic \(T\)
  • List the possible values of \(T\)
  • Under \(H_0\), all ways of re-arranging the data are equally likely
  • \(P(T = t)\) is proportional to the number of ways of getting the value \(t\)
  • The permutation \(p\)-value is the probability of getting a value of \(T\) as extreme or more extreme as the value we observed, "extreme" meaning in a direction inconsistent with \(H_0\)

Permutation Test – One-Sample Problem

  • Location model: \(X_i = \theta + e_i\)
  • Assume error distribution has symmetric pdf \(f(t)\) around \(\theta\)
  • Hypothesis test: \(H_0: \theta = 0\) versus \(H_A: \theta > 0\)
  • Test statistic: Sum of deviations of \(\theta\) about \(0\)
  • Under null, positive and negative deviations cancel each other out, so sum should be close to zero
  • We reject the null in favor of alternative if sum is "large"
  • To find what large means we comaper to null distribution using permutation
  • What to permute?
  • We use principle of sufficiency

Permutation Test – One-Sample Problem

  • Assume that we had lost the signs information
  • Under the null, we can recover the orginal distribution by randomly assigning signs to each deviation
  • In this case, we say that the deviations are sufficient
  • Under the alternative of a location parameter larger than zero, randomizing the signs of the deviations should reduce the sum from what it was originally
  • List all possible reassignments of plus and minus signs \(2^n\)
  • Compare observed sum with all possible sums

Permutation Test – Two-Sample Problem

  • Test statistic: Sum of observations from second population
  • Assume that we had lost the group assignments
  • Under the null, we can recover the original distribution by randomly assigning groups to each observation
  • In this case, we say that the observations (without group labels) are sufficient
  • Under the alternative one group will have a larger sum
  • List all possible group reassignments and compute sum of second population
  • Compare observed sum to all possible sums

Assumption: Exchangeable Observations

  • A sufficient condition for permutation test is exchangeable of observations
  • Consider a collection of random variables \[ X_1,\dots,X_n \]
  • If their joint distribution are equal under permuations \(\pi\) \[P_{X_1,\dots,X_n}(A) = P_{X_{\pi(1)},\dots,X_{\pi(n)}}(A)\]
  • then \(X_1,\dots,X_n\) are called exchangable
  • This is a weaker assumption than indepdendence of observations in the combined sample

Assumption: Exchangeable Observations

  • Some examples:
  • Independent and identically distributed observations are exchangeable
  • Samples without replacement from a finite population are exchangeable:
    • An urn contains \(b\) black balls, \(r\) red balls, \(y\) yellow balls, and so forth
    • A series of balls are extracted from the urn
    • After the \(i\)th extraction, the color of the ball \(X_i\) is noted and \(k\) balls of the same color are added to the urn,
    • where \(k\) can be any integer, positive, negative, or zero
    • The set of random events \(\{X_i\}\) form an exchangeable sequence
  • Dependent normally distributed random variables \(\{X_i\}\) for which the variance of \(X_i\) is a constant independent of \(i\) and the covariance of \(X_i\) and \(X_j\) is a constant independent of \(i\) and \(j\)

Assumption: Exchangeable Observations

  • Sometimes a simple transformation will ensure that observations are exchangeable
  • For example, if we know that \(X\) comes from a population with mean \(\mu\) and distribution \(F(x−\mu)\) and
  • an independent observation \(Y\) comes from a population with mean \(v\) and distribution \(F(x−v)\)
  • then the independent variables \(X'=X−\mu\) and \(Y'=Y−v\) are exchangeable

How to Compute It?

  • Small sample size:
    Enumerations of all permutations using Gray codes
    (takes expotional time in \(n\))
  • Medium sample size:
    Fast Frourier Transform
    (takes polynomial time in \(n\))
  • Large sample size:
    Monte Carlo simulations
    (time depends on accuracy requirements)

References

  • Good (2005). Permutations, Parametric, and Boostrap Test of Hypothesis
  • Basu (1980). Randomization Analysis of Experimential Data: The Fisher Randomization Test
  • Pagano and Tritchler (1983). On Obtaining Permutation Distributions in Polynomial Time
  • Diaconis And Holmes (1994). Gray Codes for Randomization Procedures