Stanford University, Spring 2016, STATS 205

## The Lady and the Tea

• Fisher starts his second chapter in his famous book on "The Design of Experiments" first published in 1935:
• Quote: A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup.
• Fisher proposed the the following experiment to test this
• Prepare eight cups of tea
• In four, add tea first
• In the other four, add milk first
• Hold constant other features of the cups of tea (size, temperature, etc.)
• Present cups in random order
• Lady tasts them and judges
• She knows that there are four of each type

## The Lady and the Tea

• Our theory is that the lady can taste the difference
• Which translates into our null hypothesis:
$$H_0:$$ lady cannot taste difference
• The test statistics is the number of correct judgements
• Under the null all judgments are equally likely
• What is the null distribution of correct judgments?
• Once we have that compare it to the experiment and compute the probability of having observed the experimantal outcome or a more extreme case

## The Lady and the Tea

• Because of randomization, all $$8! = 40320$$ permutations of the cups are equally likely, and each one has its own number of correct judgements
• There are $${8 \choose 4} = 70$$ ways of choosing 4 cups from 8 cups
• Think of it as, choosing 4 cups in succession by first choosing from 8 cups, then 7, 6, 5, which gives $$8 \times 7 \times 6 \times 5 = 1680$$
• But by doing so, we have chosen not only every possible set of 4, but every possible set in every possible order
• Since 4 cups can be rearranged in order in $$4 \times 3 \times 2 \times 1 = 24$$, we correct by $$1680/24 = 70$$

## The Lady and the Tea

##           [,1]  [,2]   [,3]   [,4]   [,5]  [,6]  [,7]   [,8]
## truth     "tea" "milk" "milk" "milk" "tea" "tea" "tea"  "milk"
## judgement "tea" "milk" "milk" "milk" "tea" "tea" "milk" "tea"
• Under the null, the reasons for the lady's judgements are unkown, except that they have nothing to do with the truth
• The judgements are what they are; they are fixed
• All $$70$$ ways to line up cups are equally likely
• But only one line up will match the lady's judgements

## The Lady and the Tea

Of the four cups where the tea was poured first, select $$t$$ of them to say "tea" correctly, and $$4−t$$ to say "milk" incorrectly

t = 0:4
probability = (choose(4,t)*choose(4,4-t))/choose(8,4)
data.frame(t=t,probability=round(probability,digits=3))
##   t probability
## 1 0       0.014
## 2 1       0.229
## 3 2       0.514
## 4 3       0.229
## 5 4       0.014
• The chances of a match is $$\frac{1}{70} = 0.014 < 0.05$$
• We need perfect match!

## The Lady and the Tea

Monte Carlo simulations

judgement = c("tea","milk","milk","milk","tea","tea","milk","tea")
nSim = 10000
permutations = replicate(nSim,sample(8,replace = FALSE))
matches = apply(permutations,2,function(i) sum(judgement[i] == judgement))
data.frame(t=t,
probability=round(probability,digits=3),
monte=round(table(matches)/nSim,digits=3))
##   t probability monte.matches monte.Freq
## 1 0       0.014             0      0.013
## 2 1       0.229             2      0.224
## 3 2       0.514             4      0.518
## 4 3       0.229             6      0.230
## 5 4       0.014             8      0.015

## The Lady and the Tea

t = 0:7
probability = (choose(7,t)*choose(7,7-t))/choose(14,7)
data.frame(t=t,probability=round(probability,digits=4))
##   t probability
## 1 0      0.0003
## 2 1      0.0143
## 3 2      0.1285
## 4 3      0.3569
## 5 4      0.3569
## 6 5      0.1285
## 7 6      0.0143
## 8 7      0.0003
• If she tasted 14 cups, it would be possible to reject $$H_0$$ without requiring perfect judgement
• We need either perfect match or one miss to get a $$p$$-value below $$0.05$$

## Darwin and Fisher

• In chapter 3 of Fisher's book, he then introduced the first nonparametric test
• He illustrated his approach on data from Darwin
• Darwin raised cross- and self-fertilized corn (Zea mays) plants
• He planted equal numbers of each in four different pots, but not same number in each pot
• Darwin measured heights of plants when they were between 12 and 24 inches in height

## Darwin and Fisher

Original data (plant height in inches) from Darwin listed in Fisher's book

##    pair pot  cross   self   diff
## 1     1   1 23.500 17.375  6.125
## 2     2   1 12.000 20.375 -8.375
## 3     3   1 21.000 20.000  1.000
## 4     4   2 22.000 20.000  2.000
## 5     5   2 19.125 18.375  0.750
## 6     6   2 21.500 18.625  2.875
## 7     7   3 22.125 18.625  3.500
## 8     8   3 20.375 15.250  5.125
## 9     9   3 18.250 16.500  1.750
## 10   10   3 21.625 18.000  3.625
## 11   11   3 23.250 16.250  7.000
## 12   12   4 21.000 18.000  3.000
## 13   13   4 22.125 12.750  9.375
## 14   14   4 23.000 15.500  7.500
## 15   15   4 12.000 18.000 -6.000

## Darwin and Fisher

Fisher proceeds by testing the hypothesis
$$H_0:$$ No difference between crossed and self-fertilized plants

He peforms the paired sample $$t$$-test

##
##  Paired t-test
##
## data:  ZeaMays$cross and ZeaMays$self
## t = 2.148, df = 14, p-value = 0.0497
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.003899165 5.229434169
## sample estimates:
## mean of the differences
##                2.616667

ZeaMays$diff ## [1] 6.125 -8.375 1.000 2.000 0.750 2.875 3.500 5.125 1.750 3.625 ## [11] 7.000 3.000 9.375 7.500 -6.000 • Then he invents the first permutation test • Under $$H_0$$ that self-fertilized versus cross-fertilized does not matter, only chance determined whether $$A-B$$ or $$B-A$$ • So the absolute value of the difference is what it is, but the plus or minus sign is by chance alone • Test statistic is sum of the differences • There are $$2^{15} = 32768$$ ways to swap the plus and minus signs, all equaly likely under $$H_0$$ • Calculate statistic for each one to get null distribution ## Darwin and Fisher • Random samples $$X_1,\dots,X_n$$ from $$X$$ • Test $$H_0: E(X) = 0$$ • Write the usuall statistic as $T = \sum_{i=1}^n X_i = \sum_{i=1}^n |X_i| \delta_i$ • and split samples into two parts $(|X_1|,\dots,|X_n|) \hspace{1cm}\text{and}\hspace{1cm} (\delta_1,\dots,\delta_n)$ • Under $$H_0$$ the two parts are independent and $$\delta_i$$ is a fair coin flip $$\{-1,1\}$$ • The distribution of $$T$$ is not well defined under $$H_0$$ ## Darwin and Fisher • But if we fix $$|X_i|$$ at their observed values and regard $$\delta_i$$ as random • Then the condition null of $$T$$ is well defined • The only thing left to do is to compute $P\left(T \ge t \, \big| \, H_0, |X_1|, \dots, |X_n| \right)$ by listing all possible samples of type $\left(\pm \, |X_1|, \dots, \pm \, |X_n|\right)$ • This is usually called the Fisher's randomization test • Permutation tests describe a special case of randomization • In permutation tests, we permute group labels on the observations ## Darwin and Fisher Using Monte Carlo simulations nSim = 10000 n = length(ZeaMays$diff)
absDiff = abs(ZeaMays$diff) signs = matrix(replicate(nSim*n,sample(c(-1,1),size=1)),nrow = nSim) TNull = apply(signs,1,function(row) sum(row * absDiff)) T0 = sum(ZeaMays$diff)
pvalue = 2*mean(TNull >= T0); pvalue
## [1] 0.054

## Darwin and Fisher

• We already looked at similar tests
• The sign test:
• Test statistic: $$S = \#_i \{ X_i \}$$
• Wilcoxon signed-rank test: $$W = \sum_{i=1}^n \operatorname{R}(|X_i|) \delta_i$$
• In contrast to the randomization test, the Wilcoxon signed-rank test is not conditional on the ranks

## Permutation Test – General Recipe

• Decide on a test statistic $$T$$
• List the possible values of $$T$$
• Under $$H_0$$, all ways of re-arranging the data are equally likely
• $$P(T = t)$$ is proportional to the number of ways of getting the value $$t$$
• The permutation $$p$$-value is the probability of getting a value of $$T$$ as extreme or more extreme as the value we observed, "extreme" meaning in a direction inconsistent with $$H_0$$

## Permutation Test – One-Sample Problem

• Location model: $$X_i = \theta + e_i$$
• Assume error distribution has symmetric pdf $$f(t)$$ around $$\theta$$
• Hypothesis test: $$H_0: \theta = 0$$ versus $$H_A: \theta > 0$$
• Test statistic: Sum of deviations of $$\theta$$ about $$0$$
• Under null, positive and negative deviations cancel each other out, so sum should be close to zero
• We reject the null in favor of alternative if sum is "large"
• To find what large means we comaper to null distribution using permutation
• What to permute?
• We use principle of sufficiency

## Permutation Test – One-Sample Problem

• Assume that we had lost the signs information
• Under the null, we can recover the orginal distribution by randomly assigning signs to each deviation
• In this case, we say that the deviations are sufficient
• Under the alternative of a location parameter larger than zero, randomizing the signs of the deviations should reduce the sum from what it was originally
• List all possible reassignments of plus and minus signs $$2^n$$
• Compare observed sum with all possible sums

## Permutation Test – Two-Sample Problem

• Test statistic: Sum of observations from second population
• Assume that we had lost the group assignments
• Under the null, we can recover the original distribution by randomly assigning groups to each observation
• In this case, we say that the observations (without group labels) are sufficient
• Under the alternative one group will have a larger sum
• List all possible group reassignments and compute sum of second population
• Compare observed sum to all possible sums

## Assumption: Exchangeable Observations

• A sufficient condition for permutation test is exchangeable of observations
• Consider a collection of random variables $X_1,\dots,X_n$
• If their joint distribution are equal under permuations $$\pi$$ $P_{X_1,\dots,X_n}(A) = P_{X_{\pi(1)},\dots,X_{\pi(n)}}(A)$
• then $$X_1,\dots,X_n$$ are called exchangable
• This is a weaker assumption than indepdendence of observations in the combined sample

## Assumption: Exchangeable Observations

• Some examples:
• Independent and identically distributed observations are exchangeable
• Samples without replacement from a finite population are exchangeable:
• An urn contains $$b$$ black balls, $$r$$ red balls, $$y$$ yellow balls, and so forth
• A series of balls are extracted from the urn
• After the $$i$$th extraction, the color of the ball $$X_i$$ is noted and $$k$$ balls of the same color are added to the urn,
• where $$k$$ can be any integer, positive, negative, or zero
• The set of random events $$\{X_i\}$$ form an exchangeable sequence
• Dependent normally distributed random variables $$\{X_i\}$$ for which the variance of $$X_i$$ is a constant independent of $$i$$ and the covariance of $$X_i$$ and $$X_j$$ is a constant independent of $$i$$ and $$j$$

## Assumption: Exchangeable Observations

• Sometimes a simple transformation will ensure that observations are exchangeable
• For example, if we know that $$X$$ comes from a population with mean $$\mu$$ and distribution $$F(x−\mu)$$ and
• an independent observation $$Y$$ comes from a population with mean $$v$$ and distribution $$F(x−v)$$
• then the independent variables $$X'=X−\mu$$ and $$Y'=Y−v$$ are exchangeable

## How to Compute It?

• Small sample size:
Enumerations of all permutations using Gray codes
(takes expotional time in $$n$$)
• Medium sample size:
Fast Frourier Transform
(takes polynomial time in $$n$$)
• Large sample size:
Monte Carlo simulations
(time depends on accuracy requirements)

## References

• Good (2005). Permutations, Parametric, and Boostrap Test of Hypothesis
• Basu (1980). Randomization Analysis of Experimential Data: The Fisher Randomization Test
• Pagano and Tritchler (1983). On Obtaining Permutation Distributions in Polynomial Time
• Diaconis And Holmes (1994). Gray Codes for Randomization Procedures