Permutation Tests (Part 2)

Stanford University, Spring 2016, STATS 205

Exchangeablility

The idea is that before we observe 20 flips of a coin
and if we think of 17 heads and 3 tails as a possible outcome
then exchangeablility means that we don't think of the positions that the 3 heads can occupy as being special

Exchangeablility

An infinite sequence $X_1,\dots,X_n,\dots$ of random variables is said to be exchangeable if for all $n = 2,3,\dots$, \[X_1,\dots,X_n \overset{d}{=} X_{\pi(1)},\dots,X_{\pi(n)} \text{ for all } \pi \in S(n)\] where $S(n)$ is the group of permutations of $\{1,\dots,n\}$
For example, for a binary sequence, we may have: \[p(1,1,0,0,0,1,1,0) = p(1,0,1,0,1,0,0,1)\]
If $X_1,\dots,X_n,\dots$ are independent and identically distributed, they are exchangeable, but not conversely

Polya's Urn

Consider an urn with $b$ black balls and $w$ white balls
Draw a ball at random and note its colour
Replace the ball together with $a$ balls of the same colour
Repeat the procedure ad infinitum
Let $X_i =1$ if the $i$th draw yields a black ball and $X_i = 0$ otherwise
The sequence $X_1,\dots,X_n,\dots$ is exchangeable: \[ \begin{align} p(1,1,0,1) & = \frac{b}{b+w} \frac{b+a}{b+w+a} \frac{w}{b+w+2a} \frac{b+2a}{b+w+3a} \\ & = \frac{b}{b+w} \frac{w}{b+w+a} \frac{b+a}{b+w+2a} \frac{b+2a}{b+w+3a} = p(1,0,1,1) \end{align} \]
$X_1,\dots,X_n,\dots$ are not independent

Exchangeability and de Finetti's Theorem

Suppose $X_1,\dots,X_n$ are conditionally i.i.d. given some unknown parameter $\theta$
Then for any permutation $\pi$ of $\{1,...,n\}$ and any set of values $(x_1,\dots,x_n)$
$X_i$'s are conditionally i.i.d. \[f(x_1,\dots,x_n) = \int f(x_1,\dots,x_n | \theta) p(\theta) d\theta = \int \left( \prod_{i=1}^n f(x_i|\theta) \right) p(\theta) d\theta\]
product does not depend on order: \[= \int \left( \prod_{i=1}^n f(x_{\pi(i)} | \theta) \right) p(\theta) d\theta = f\left( x_{\pi(i)},\dots,x_{\pi(n)} \right)\]

Exchangeability and de Finetti's Theorem

This means: \[ X_1,\dots,X_n \text{ are exchangeable for all $n$ } \Leftarrow \begin{cases} X_1,\dots,X_n \, \big| \, \theta \text{ i.i.d. } \\ \theta \sim p(\theta) \end{cases} \]
In the other direction:
- de Finetti showed in 1931 that all infinite exchangeable binary sequences are i.i.d. Bernoulli with beta prior $\mathcal{B}(b/a,w/a)$ on the success parameter
- Hewitt and Savage (1955) generalized it to any infinite exchangeable sequences
- Diaconis and Freedman (1980) generalized it to finite exchangeable sequences

Example: Letter Sequences in Bible

Suppose we have text written in foreign language
We are asked whether text is meaningful or meaningless
We have very limitted partial dictionary: e.g. umbrella and rain
Can we decide using this limited dictionary whether text is meaningful?
Idea: Check whether conceputally related words appear in close proximity
If text is meaningless, we don't expect this to happen very often
If text is meaningful, we expect this to happen more often than by pure chance

Example: Letter Sequences in Bible

Write bible as two-dimensional array of letters ignoring spaces

Source: Witztum, Rips, and Rosenberg (1994)

Example: Letter Sequences in Bible

Define distance between two words $w$ and $w'$:
Euclidean distance (considering letter matrix separated by spacing $1$) $d(w,w') =$
squared distance between consecutive letters of $w$ +
squared distance between consecutive letters of $w'$ +
minimal distance between a letter from $w$ and $w'$
Define a statistic: Average distance between a set of predefined word pairs calculated over the entire text
Choose word pair list: From other book (Encyclopedia of Great Men in Israel): 32 Rabbi names and their birthdays
Compute null distribution of average distance between all the word pairs using permutations

Example: Letter Sequences in Bible

They computed $p$-value using $999\,999$ random name–birthday permutations (from the total of $32!$ permutations to assign $i$th name of Rabbi to randomly permuted birthday)
Rank the observed statistics among the permutations
They found a $p$-value of $0.00002$
Thus, we can say that equidistant letter sequences of Rabbi name and birthday is not due to chance
For more details see: Witztum, Rips, and Rosenberg (1994). Equidistant Letter Sequences in the Book of Genesis

Example: Final Project on Autistic Brain

We encouter two-sample problems in many neuroimaging studies
- Compare two populations, such as people who are autistic and healthy controls
- Find morphological, connectivity, and functional difference
- Inform new therapies and cures
We work on preprocessed neuroimaging data from the Autism Brain Imaging Data Exchange (ABIDE). The data is openly avaialbe on the ABIDE website
- ABIDE is a collaboration of 16 international imaging sites
- Openly sharing neuroimaging data from 539 individuals suffering from Autism Spectrum Disorder (ASD) and 573 typical controls
- We will focus on a subset of 39 paricipants (all aquired at Stanford)

Data: Cortical Thickness Measurements

Source: multiple-sclerosis-research.blogspot.com

Data: Cortical Thickness Measurements

Source: Das, Avants, Grossman, and Gee (2009). Registration based cortical thickness measurement

Data Preprocessing: Image Registration

Analyze geometric difference through deformation
Find deformations $\varphi_k$ from template $\boldsymbol{I}_0$ to participant images $\boldsymbol{I}_k$

Data Preprocessing: Image Registration

Parametrize $\varphi_k(x,\boldsymbol{q})$ with B-Splines (coordinates $x_i \in \boldsymbol{I}_0$, parameters $\boldsymbol{q}$) \[\sum_{i=1}^N \left( \boldsymbol{I}_k(x_i) \circ \varphi_k(x_i,\boldsymbol{q}) - \boldsymbol{I}_0(x_i) \right)^2 + \| \operatorname{Jac} \varphi_k(x_i,q) \|_{\ell_2}^2\]
B-Splines are piecewise polynomial functions and can be used as basis functions to describe more complicated nonlinear functions
minimize this with fixed $\lambda$ \[\widehat{\boldsymbol{q}} = \operatorname{minimize}_\boldsymbol{q} \Big\{ \operatorname{Similarity}(\boldsymbol{I}_0,\boldsymbol{I}_k,\boldsymbol{q}) + \lambda \operatorname{Regularity}(\boldsymbol{q}) \Big\}\]

Data Analysis: Voxelwise Hypothesis Testing

Split the data in two groups: autistic (1) and healthy participants (2)

PaInfo <- read.csv("Phenotypic_V1_0b_preprocessed1.csv", header=TRUE)
copiedFilesInd = PaInfo$FILE_ID %in% substr(fileNames,1,16)
copiedFilesGroup = PaInfo$DX_GROUP[copiedFilesInd]
ThicknessAutistic = brainMatrix[copiedFilesGroup==1,mask==1]
ThicknessHealthy = brainMatrix[copiedFilesGroup==2,mask==1]

Peform voxelwise nonparametric test Wilcoxon two-sample rank test

uncorrectedPValues = sapply(1:dim(ThicknessAutistic)[2], 
                            function(i) 
                              wilcox.test(ThicknessAutistic[,i],
                                          ThicknessHealthy[,i],
                                          alternative = "two.sided")$p.value)

Data Analysis: Clusterwise Significance

One could now proceed and correct for multiple comparisons using Gaussian random field theory or controlling the false discorvery rate
An alternative options is to use permutations tests
The idea (do the following $N$ times):
- Permute group assignments
- Compute voxelwise $p$-value
- Threshold $p$-value image at fixed primary threshold (e.g. $0.001$)
- Note largest contiguous voxel cluster size
This forms our null distribution of largest contiguous voxel cluster sizes
Declare unpermuted observed clusters as significant when larger than $\operatorname{floor}(\alpha \times N + 1)$ in permutation distribution

Data Analysis: Clusterwise Significance

This approach comes at the price of reduced localization power
The null hypotheses for voxels within a significant cluster are not tested, so individual voxels cannot be declared significant
Only the omnibus null hypothesis for the cluster can be rejected
Further, the choice of primary threshold dictates the power of the test in detecting different types of deviation from the omnibus null hypothesis
With small primary threshold (e.g. $0.001$),
we will foucs on many but small clusters
With large primary threshold (e.g. $0.01$),
we will foucs on few but large clusters

References

Diaconis (1977). Finite Forms Of De Finetti's Theorem On Exchangeability
Diaconis and Freedman (1980). Finite exchangeable sequences
de Finetti (1972). Probability Induction and Statistics
Hewitt and Savage (1955). Symmetric measures on Cartesian products
Peter Hoff (2009). A First Course in Bayesian Statistical Methods
Good (2005). Permutations, Parametric, and Boostrap Test of Hypothesis
Witztum, Rips, and Rosenberg (1994). Equidistant Letter Sequences in the Book of Genesis
Nichols and Holmes (2001). Nonparametric Permutation Tests For Functional Neuroimaging: A Primer with Examples
Das, Avants, Grossman, and Gee (2009). Registration Based Cortical Thickness Measurement
Steffen Lauritzen (2007). Lecture notes