Stanford University, Spring 2016, STATS 205

Previous Lecture

  • Introduced bootstrap
  • Looked at couple of example applications:
    • confidence intervals
    • hypotheses testing
  • Two sources of errors:
    • sampling variability
    • bootstrap resampling variability
  • Complete enumerations for exhaustive bootstrap

Today

  • Example comparison between boostrap using
    • Monte Carlo simulations (\(B\) is large)
    • exhaustive bootstrap (\(B\) is infinity)
  • Smarter way to enumerate boostrap samples using Gray codes
  • Exploring the tails of a bootstrap distribution

Law Schools Example

library(bootstrap); data(law); t(law)
##        1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
## LSAT 576 635 558 578 666 580 555 661 651 605 653 575 545 572 594
## GPA  339 330 281 303 344 307 300 343 336 313 312 274 276 288 296

Law Schools Example

Law Schools Example

  • Sample correlation coefficient:
## [1] 0.7763745
  • How accurate is this estimate. Let's look at the bootstrap samples:
draw.bootstrap.sample = function() {
  n = dim(law)[1]
  ind = sample(n,replace = TRUE)
  return(cor(law[ind,]$LSAT,law[ind,]$GPA))
}
B = 40000
thetastar = replicate(B,draw.bootstrap.sample())

Law Schools Example

Evaluate the correlation coefficient using a Monte Carlo simulation:

Law Schools Example

Create matrix of all \(\binom{2n-1}{n-1}\) enumerations:

library(partitions)
n = 15
allCompositions = compositions(n,n)

allCompositions[,1:10]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   15   14   13   12   11   10    9    8    7     6
##  [2,]    0    1    2    3    4    5    6    7    8     9
##  [3,]    0    0    0    0    0    0    0    0    0     0
##  [4,]    0    0    0    0    0    0    0    0    0     0
##  [5,]    0    0    0    0    0    0    0    0    0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0
## [10,]    0    0    0    0    0    0    0    0    0     0
## [11,]    0    0    0    0    0    0    0    0    0     0
## [12,]    0    0    0    0    0    0    0    0    0     0
## [13,]    0    0    0    0    0    0    0    0    0     0
## [14,]    0    0    0    0    0    0    0    0    0     0
## [15,]    0    0    0    0    0    0    0    0    0     0

Law Schools Example

Check number of compositions:

library(assertthat)
nCompositions = dim(allCompositions)[2]
nCompositions
## [1] 77558760
# Check if we get the same value as in theory
nCompositionsTheory = choose(2*n-1,n-1) 
if(!assert_that(nCompositions == nCompositionsTheory)) {
  print("there is a problem with the number of compositions")
} else {
  print("the number of compositions is correct")
}
## [1] "the number of compositions is correct"

Law Schools Example

Evaluate the correlation coefficient for all \(\binom{2n-1}{n-1}\) bootstrap samples.

library(parallel)
if(!file.exists("enumData.Rdata")) {
  ptm = proc.time()
  enumData = mclapply(1:nCompositions, function(i) {
    ind = allCompositions[,i]
    law.list = lapply(1:n,function(j) matrix(rep(law[j,],ind[j]),ncol = 2 ,byrow = TRUE))
    newLaw = do.call(rbind, law.list)
    c(cor(unlist(newLaw[,1]),unlist(newLaw[,2])),dmultinom(ind,prob = rep(1,n)))
    }, mc.cores = 4)
  proc.time() - ptm
  enumData = t(simplify2array(enumData))
  colnames(enumData) = c("cor","weight")
  save(enumData,file = "enumData.Rdata")  
} else {
  load("enumData.Rdata")
}

Law Schools Example

Same plot as before but now with all bootstrap samples:

Law Schools Example

Use the library parallel to speedup computations

library(parallel)
enumData = mclapply(1:nCompositions, function(i) { ... }, mc.cores = 4 )
  • On a laptop with 4 cores this takes about 5 hours to compute
  • We can speedup enumeration by changing only one coordinates at the time using Gray codes
  • For example, if our estimate is the sample mean for \(n = 3\)
    • \(s(\boldsymbol{x}^{*1}) = ( x_1 + x_2 + x_3)/3\)
    • \(s(\boldsymbol{x}^{*2}) = ( x_1 + x_2 + x_2)/3\)
  • To calculate \(s(\boldsymbol{x}^{*2})\) from \(s(\boldsymbol{x}^{*1})\)
    • \(s(\boldsymbol{x}^{*1}) - x_3/3 + x_2/3\)

Speedup for Enumerations using Gray Codes

  • Gray codes are ordered lists of binary \(n\)-tuples
  • They are ordered so that success values only differ in a single space
  • For instance, for \(n = 3\), the list is:
    000, 001, 011, 010, 110, 111, 101, 100
  • Notice, a computer scientist might intuitively want to write this:
    000, 001, 010, 011, 100, 101, 110, 111
  • However, this sequence differes in many spaces
  • Better than trying to reorder the wrong elements, we can define recursive algorithm to generate a valid list

Speedup for Enumerations using Gray Codes

  • Start with a list with two entries 0,1
  • Get two list:
    • put a zero before each entry in \(L_n\)
    • put a one before each entry in \(L_n\)
  • To get \(L_{n+1}\) concatenate the two list by first followed by second in reversed order
  • So to move from \(L_1\) to \(L_2\)
    00, 01
    10, 11
  • and concatenate
    00, 01, 11, 10
  • repeat…

Speedup for Enumerations using Gray Codes

  • We can also just add one successor at the time
  • As we saw before, a computer scientist would encode an integer as \(m = \sum \epsilon_i 2^i\) using the binary sequence \[ \dots \, \epsilon_3 \, \epsilon_2 \, \epsilon_1 \, \epsilon_0 \]
  • This can be mapped to gray codes \[ \dots \, e_3 \, e_2 \, e_1 \, e_0 \] using \(e_i = \epsilon_i + \epsilon_{i+1} \, (\bmod 2) \text{ for } (i = 0,1,2,\dots)\)

Speedup for Enumerations using Gray Codes

  • When \(n = 4\), the integer 7 in binary code is 0111 and using the mapping
    \(e_0 = 1 + 1 = 0\)
    \(e_1 = 1 + 1 = 0\)
    \(e_2 = 1 + 0 = 1\)
    \(e_3 = 0 + 0 = 0\)
  • we get the corresponding gray code 0100

  • Next, integer 8 in binary code is 1000 and using the mapping
    \(e_0 = 0 + 0 = 0\)
    \(e_1 = 0 + 0 = 0\)
    \(e_2 = 0 + 1 = 1\)
    \(e_3 = 1 + 0 = 1\)
  • we get the corresponding gray code 1100

Exploring the Tails of a Bootstrap Distribution

  • Using Markov chain Monte Carlo (MCMC) to inject a small amount of randomness (something between Monte Carlo and complete enumerations)
  • We will use MCMC to deriving large deviations estimate such as \(P( T \ge t)\)
  • To see how it works, we let's have a look at the Metropolis algorithm
  • Consider a the finite sample space
  • Think of it as a state space, where each outcome corresponds to a state
  • Metropolis can sample from an unormalized probability \(\pi(x)\) on finite state space \(\mathcal{X}\)
  • Define a Markov transition matrix \(J(x,y)\) that defines nonzero probabilities of moving from \(x\) to \(y\) and \(y\) to \(x\)
  • Metropolis changes \(J(x,y)\) to a new matrix \(K(x,y)\) that corresponds to a possibly unnormalized version of \(\pi(x)\)

Exploring the Tails of a Bootstrap Distribution

  • Recipe:
    • Pick initial point in sample space \(x_0\)
    • Pick potential next move from \(J(x,y)\) with \(J(x,y) > 0\) and \(J(y,x) > 0\)
    • Evaluate \[A(x,y) = \frac{\pi(y)J(y,x)}{\pi(x)J(x,y)}\]
    • If \(A(x,y) \ge 1\) move to \(y\)
    • If \(A(x,y) < 1\) flip a coing with this success probability
      • and move to \(y\) if success
      • otherwise stay at \(x\)

Exploring the Tails of a Bootstrap Distribution

  • We can write this in matrix form \[ K(x,y) = \begin{cases} J(x, y) & \text{if } x \ne y, A(x, y) ≥ 1 \\ J(x, y)A(x, y) & \text{if } x \ne y, A(x, y) < 1 \\ J(x, y) + \sum_{z:A(x,z)<1} J(x, z)(1 − A(x, z)) & \text{if } x = y \end{cases} \]
  • Then we can use the Fundamental Theorem of Markov Chains to prove that \[ K^n(x, y) \to \pi(y) \text{ for each } x,y \in \mathcal{X} \]
  • or in other words, the matrix \(K^n = K^1 K^2 \cdots K^n\) converges to a matrix \[\pi K = \lambda \pi \leftrightarrow \pi K = \pi\] with one left eigenvector \(\pi\) and one eigenvalue \(\lambda = 1\)

Exploring the Tails of a Bootstrap Distribution

  • To sample from \(\pi\), apply \(J\) to the left of the current sample position \(x_t\)
  • This will give us the next sample \(x_{t+1}\)
  • A nice introduction to MCMC is here

Exploring the Tails of a Bootstrap Distribution

We construct a Markov chain to sample from \(P( T \ge t)\):

  • Pick \(I\) between \(1 \le I \le n\) uniformly, and
  • Replace \(x^*_I\) with new value from origin data \(\{ x_1, x_2, \dots, x_n \}\)
  • If new sample vector \(\tilde{\boldsymbol{x}}^*\) satisfies \(T(\tilde{\boldsymbol{x}}^*) \ge t\) then change is made
  • otherwise the chain stays at previous sample vector

Exploring the Tails of a Bootstrap Distribution

Exploring the Tails of a Bootstrap Distribution

Exploring the Tails of a Bootstrap Distribution

Exploring the Tails of a Bootstrap Distribution

Then to estimate \(P( T \ge t)\):

  • Choose a grid \(t_0 < t_1 < \dots < t_l < t\) with \(t_0\) chosen in the middle of distribution of \(T\) and \(t_i\) chosen so that \(P(T \ge t_{i+1} | T \ge t_i)\) is not too small
  • Estimate \(P( T \ge t_0 )\) by ordinary Monte Carlo
  • Estimate \(P( T \ge t_1 | T \ge t_0)\) by running the Markov chain on \(\{ \tilde{\boldsymbol{x}}^*: T(\tilde{\boldsymbol{x}}^*) \ge t_0) \}\) and count what proportion of values satisfy the constrain \(T \ge t_1\)
  • Multiplying these estimates gives the conditional probability: \[ \widehat{P}(T \ge t) = \widehat{P}(T \ge t_0) \widehat{P}(T \ge t_1 | T \ge t_0) \cdots \widehat{P}(T \ge t | T \ge t_l) \] and joint probability \(\widehat{P}(T \ge t) = \widehat{P}(T \ge t_0,T \ge t_1, \dots, T \ge t_l, T \ge t)\)

Theoretical Underpinnings of the Bootstrap

For background on statistical functionals and asymptotics of the boostrap: