Stanford University, Spring 2016, STATS 205

## Previous Lecture

• Introduced bootstrap
• Looked at couple of example applications:
• confidence intervals
• hypotheses testing
• Two sources of errors:
• sampling variability
• bootstrap resampling variability
• Complete enumerations for exhaustive bootstrap

## Today

• Example comparison between boostrap using
• Monte Carlo simulations ($$B$$ is large)
• exhaustive bootstrap ($$B$$ is infinity)
• Smarter way to enumerate boostrap samples using Gray codes
• Exploring the tails of a bootstrap distribution

## Law Schools Example

library(bootstrap); data(law); t(law)
##        1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
## LSAT 576 635 558 578 666 580 555 661 651 605 653 575 545 572 594
## GPA  339 330 281 303 344 307 300 343 336 313 312 274 276 288 296

## Law Schools Example

• Sample correlation coefficient:
## [1] 0.7763745
• How accurate is this estimate. Let's look at the bootstrap samples:
draw.bootstrap.sample = function() {
n = dim(law)[1]
ind = sample(n,replace = TRUE)
return(cor(law[ind,]$LSAT,law[ind,]$GPA))
}
B = 40000
thetastar = replicate(B,draw.bootstrap.sample())

## Law Schools Example

Evaluate the correlation coefficient using a Monte Carlo simulation:

## Law Schools Example

Create matrix of all $$\binom{2n-1}{n-1}$$ enumerations:

library(partitions)
n = 15
allCompositions = compositions(n,n)
allCompositions[,1:10]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   15   14   13   12   11   10    9    8    7     6
##  [2,]    0    1    2    3    4    5    6    7    8     9
##  [3,]    0    0    0    0    0    0    0    0    0     0
##  [4,]    0    0    0    0    0    0    0    0    0     0
##  [5,]    0    0    0    0    0    0    0    0    0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0
## [10,]    0    0    0    0    0    0    0    0    0     0
## [11,]    0    0    0    0    0    0    0    0    0     0
## [12,]    0    0    0    0    0    0    0    0    0     0
## [13,]    0    0    0    0    0    0    0    0    0     0
## [14,]    0    0    0    0    0    0    0    0    0     0
## [15,]    0    0    0    0    0    0    0    0    0     0

## Law Schools Example

Check number of compositions:

library(assertthat)
nCompositions = dim(allCompositions)[2]
nCompositions
## [1] 77558760
# Check if we get the same value as in theory
nCompositionsTheory = choose(2*n-1,n-1)
if(!assert_that(nCompositions == nCompositionsTheory)) {
print("there is a problem with the number of compositions")
} else {
print("the number of compositions is correct")
}
## [1] "the number of compositions is correct"

## Law Schools Example

Evaluate the correlation coefficient for all $$\binom{2n-1}{n-1}$$ bootstrap samples.

library(parallel)
if(!file.exists("enumData.Rdata")) {
ptm = proc.time()
enumData = mclapply(1:nCompositions, function(i) {
ind = allCompositions[,i]
law.list = lapply(1:n,function(j) matrix(rep(law[j,],ind[j]),ncol = 2 ,byrow = TRUE))
newLaw = do.call(rbind, law.list)
c(cor(unlist(newLaw[,1]),unlist(newLaw[,2])),dmultinom(ind,prob = rep(1,n)))
}, mc.cores = 4)
proc.time() - ptm
enumData = t(simplify2array(enumData))
colnames(enumData) = c("cor","weight")
save(enumData,file = "enumData.Rdata")
} else {
}

## Law Schools Example

Same plot as before but now with all bootstrap samples:

## Law Schools Example

Use the library parallel to speedup computations

library(parallel)
enumData = mclapply(1:nCompositions, function(i) { ... }, mc.cores = 4 )
• On a laptop with 4 cores this takes about 5 hours to compute
• We can speedup enumeration by changing only one coordinates at the time using Gray codes
• For example, if our estimate is the sample mean for $$n = 3$$
• $$s(\boldsymbol{x}^{*1}) = ( x_1 + x_2 + x_3)/3$$
• $$s(\boldsymbol{x}^{*2}) = ( x_1 + x_2 + x_2)/3$$
• To calculate $$s(\boldsymbol{x}^{*2})$$ from $$s(\boldsymbol{x}^{*1})$$
• $$s(\boldsymbol{x}^{*1}) - x_3/3 + x_2/3$$

## Speedup for Enumerations using Gray Codes

• Gray codes are ordered lists of binary $$n$$-tuples
• They are ordered so that success values only differ in a single space
• For instance, for $$n = 3$$, the list is:
000, 001, 011, 010, 110, 111, 101, 100
• Notice, a computer scientist might intuitively want to write this:
000, 001, 010, 011, 100, 101, 110, 111
• However, this sequence differes in many spaces
• Better than trying to reorder the wrong elements, we can define recursive algorithm to generate a valid list

## Speedup for Enumerations using Gray Codes

• Start with a list with two entries 0,1
• Get two list:
• put a zero before each entry in $$L_n$$
• put a one before each entry in $$L_n$$
• To get $$L_{n+1}$$ concatenate the two list by first followed by second in reversed order
• So to move from $$L_1$$ to $$L_2$$
00, 01
10, 11
• and concatenate
00, 01, 11, 10
• repeat…

## Speedup for Enumerations using Gray Codes

• We can also just add one successor at the time
• As we saw before, a computer scientist would encode an integer as $$m = \sum \epsilon_i 2^i$$ using the binary sequence $\dots \, \epsilon_3 \, \epsilon_2 \, \epsilon_1 \, \epsilon_0$
• This can be mapped to gray codes $\dots \, e_3 \, e_2 \, e_1 \, e_0$ using $$e_i = \epsilon_i + \epsilon_{i+1} \, (\bmod 2) \text{ for } (i = 0,1,2,\dots)$$

## Speedup for Enumerations using Gray Codes

• When $$n = 4$$, the integer 7 in binary code is 0111 and using the mapping
$$e_0 = 1 + 1 = 0$$
$$e_1 = 1 + 1 = 0$$
$$e_2 = 1 + 0 = 1$$
$$e_3 = 0 + 0 = 0$$
• we get the corresponding gray code 0100

• Next, integer 8 in binary code is 1000 and using the mapping
$$e_0 = 0 + 0 = 0$$
$$e_1 = 0 + 0 = 0$$
$$e_2 = 0 + 1 = 1$$
$$e_3 = 1 + 0 = 1$$
• we get the corresponding gray code 1100

## Exploring the Tails of a Bootstrap Distribution

• Using Markov chain Monte Carlo (MCMC) to inject a small amount of randomness (something between Monte Carlo and complete enumerations)
• We will use MCMC to deriving large deviations estimate such as $$P( T \ge t)$$
• To see how it works, we let's have a look at the Metropolis algorithm
• Consider a the finite sample space
• Think of it as a state space, where each outcome corresponds to a state
• Metropolis can sample from an unormalized probability $$\pi(x)$$ on finite state space $$\mathcal{X}$$
• Define a Markov transition matrix $$J(x,y)$$ that defines nonzero probabilities of moving from $$x$$ to $$y$$ and $$y$$ to $$x$$
• Metropolis changes $$J(x,y)$$ to a new matrix $$K(x,y)$$ that corresponds to a possibly unnormalized version of $$\pi(x)$$

## Exploring the Tails of a Bootstrap Distribution

• Recipe:
• Pick initial point in sample space $$x_0$$
• Pick potential next move from $$J(x,y)$$ with $$J(x,y) > 0$$ and $$J(y,x) > 0$$
• Evaluate $A(x,y) = \frac{\pi(y)J(y,x)}{\pi(x)J(x,y)}$
• If $$A(x,y) \ge 1$$ move to $$y$$
• If $$A(x,y) < 1$$ flip a coing with this success probability
• and move to $$y$$ if success
• otherwise stay at $$x$$

## Exploring the Tails of a Bootstrap Distribution

• We can write this in matrix form $K(x,y) = \begin{cases} J(x, y) & \text{if } x \ne y, A(x, y) ≥ 1 \\ J(x, y)A(x, y) & \text{if } x \ne y, A(x, y) < 1 \\ J(x, y) + \sum_{z:A(x,z)<1} J(x, z)(1 − A(x, z)) & \text{if } x = y \end{cases}$
• Then we can use the Fundamental Theorem of Markov Chains to prove that $K^n(x, y) \to \pi(y) \text{ for each } x,y \in \mathcal{X}$
• or in other words, the matrix $$K^n = K^1 K^2 \cdots K^n$$ converges to a matrix $\pi K = \lambda \pi \leftrightarrow \pi K = \pi$ with one left eigenvector $$\pi$$ and one eigenvalue $$\lambda = 1$$

## Exploring the Tails of a Bootstrap Distribution

• To sample from $$\pi$$, apply $$J$$ to the left of the current sample position $$x_t$$
• This will give us the next sample $$x_{t+1}$$
• A nice introduction to MCMC is here

## Exploring the Tails of a Bootstrap Distribution

We construct a Markov chain to sample from $$P( T \ge t)$$:

• Pick $$I$$ between $$1 \le I \le n$$ uniformly, and
• Replace $$x^*_I$$ with new value from origin data $$\{ x_1, x_2, \dots, x_n \}$$
• If new sample vector $$\tilde{\boldsymbol{x}}^*$$ satisfies $$T(\tilde{\boldsymbol{x}}^*) \ge t$$ then change is made
• otherwise the chain stays at previous sample vector

## Exploring the Tails of a Bootstrap Distribution

Then to estimate $$P( T \ge t)$$:

• Choose a grid $$t_0 < t_1 < \dots < t_l < t$$ with $$t_0$$ chosen in the middle of distribution of $$T$$ and $$t_i$$ chosen so that $$P(T \ge t_{i+1} | T \ge t_i)$$ is not too small
• Estimate $$P( T \ge t_0 )$$ by ordinary Monte Carlo
• Estimate $$P( T \ge t_1 | T \ge t_0)$$ by running the Markov chain on $$\{ \tilde{\boldsymbol{x}}^*: T(\tilde{\boldsymbol{x}}^*) \ge t_0) \}$$ and count what proportion of values satisfy the constrain $$T \ge t_1$$
• Multiplying these estimates gives the conditional probability: $\widehat{P}(T \ge t) = \widehat{P}(T \ge t_0) \widehat{P}(T \ge t_1 | T \ge t_0) \cdots \widehat{P}(T \ge t | T \ge t_l)$ and joint probability $$\widehat{P}(T \ge t) = \widehat{P}(T \ge t_0,T \ge t_1, \dots, T \ge t_l, T \ge t)$$

## Theoretical Underpinnings of the Bootstrap

For background on statistical functionals and asymptotics of the boostrap: