The Bootstrap (Part 1)

Stanford University, Spring 2016, STATS 205

What Happened So Far

We want to compute functionals \(\theta = T(F)\), e.g.
- the mean: \(T(F) = \int x \, dF(x) = \int x \, f(x) \, dx\)
- the median: \(T(F) = F^{-1}(1/2)\)
Parameter of interest \(\theta\) is functional of unkown \(F\)
Assumptions on \(f(x)\) relatively weak, e.g. symmetric around zero
We tested \(\theta = \theta_0\) from a sample \(\boldsymbol{x} = \{ x_1,x_2,\dots,x_n \}\) drawn from distribution \(F\) by comparing the observed statistic \(s(\boldsymbol{x})\) to the null distribution of test statistics
We introduced estimators \(\widehat{\theta}\) and confidence intervals from a sample
We quantified the robustness of estimators \(\widehat{\theta}\) to outliers

Computer more powerful, resampling procedure more widespread
The bootstrap is another tool to measure error in an estimate or significance of a hypothesis
A bootstrap sample is a sample from the original sample taken (with replacement)
This works when the histogram of the sample is representative of the population
In other words, the histogram of the sample resembles the pdf of the random variable
Drawing sample from the histogram is then the same as drawing samples from the population
Thus yields an estimate of the sampling distribution of the statistic \(s(\boldsymbol{x})\) or estimate \(\widehat{\theta}\)

The data are drawn independently and identically from a unknown distribution \(F\)
The bootstrap supposes that the empirical distribution \(F_n\) is a good description of the unknown distribution \(F\)
The plugin-in estimate of a parameter \(\theta = T(F)\) is define as \[\widehat{\theta} = T(F_n)\]
This way one can draw as many samples from \(F_n\) to compute sample variability of all kinds of \(s(\boldsymbol{x}^*)\), e.g. the sample mean, sample median
This is done via choosing \(n\) samples with replacement (you can pick the same sample multiple times)

We observe the empirical distribution \(F_n\) \[ F_n(t) = \frac{1}{n} \sum_{i=1}^n I(x_i \le t) \]
In other words, the distribution that puts mass \(1/n\) at each \(x_i\)
Denote bootstrap sample from \(F_n\) (sample with replacement) as \[\boldsymbol{x}^* = \left[ x_1^*,x_2^*,\dots,x_n^* \right]^T\]
Estimate based on original observations \(\widehat{\theta}^* = s(\boldsymbol{x}^*)\)
Denote a collection of \(B\) bootstrap samples as \(\widehat{\theta}^*_1,\widehat{\theta}^*_2,\dots,\widehat{\theta}^*_B\)

Order bootstrap samples: \(\widehat{\theta}^*_{(1)} \le \widehat{\theta}^*_{(2)} \le \dots \le \widehat{\theta}^*_{(B)}\)
Let \(m = \alpha/2 \times B\) then
we get the approximate \((1-\alpha) \times 100\%\) confidence interval \[\left( \widehat{\theta}^*_{(m)},\widehat{\theta}^*_{(B-m)} \right)\]

Consider the paired problem with \(d_1,\dots,d_n\) are the differences
In the bootstrap sample \(d_i\) and \(-d_i\) each have probability \(1/2n\) of being selected
This forms an estimate of the null distribution of the test statistic \(T\)
Then \[\text{p-value } = \frac{\#\{T_i^*\ge T_0\}}{B}\]

Test the hypothesis: \[ H_0: \theta = \theta_0 \text{ versus } \theta > \theta_0 \]
We have to make sure that the null hypothesis is true, so we take our bootstrap samples from \[x_1-\widehat{\theta}+\theta_0,\dots,x_n-\widehat{\theta}+\theta_0\]
Then \[\text{p-value } = \frac{\#\{\widehat{\theta}_i^*\ge \widehat{\theta}\}}{B}\]

Sampling variability:
We only have a sample of size \(n\) and not the entire population
Bootstrap resampling variability:
We only use \(B\) bootstrap samples rather than an infinite number

The bootstrap estimate of a standard error \(\widehat{\operatorname{se}}_B\) of a statistic \(s\) \[\widehat{\operatorname{se}}_B = \left( \frac{1}{B} \sum_{b=1}^B (s(\boldsymbol{x}^{*b})-\bar{s})^2 \right)^{1/2}\]
when \(s\) is the sample mean of iid normals, then we get a coeffiecent of variation \[\operatorname{cv}(\widehat{\operatorname{se}}_B) = \frac{\operatorname{Var}(\widehat{\operatorname{se}}_B)}{\operatorname{E}(\widehat{\operatorname{se}}_B)}\]
as a function of \(n\) and \(B\) \[\operatorname{cv}(\widehat{\operatorname{se}}_B) = \left( \frac{1}{2n} + \frac{1}{2B} \right)^{1/2}\]

There are \(n^n\) different bootstrap samples \[\boldsymbol{x}^{*1},\boldsymbol{x}^{*2},\dots,\boldsymbol{x}^{*n^n}\] but some of them have the same subset, for example, for \(n =3\)
- \(x_1,x_1,x_2\) same as
- \(x_2,x_1,x_1\)
We group the same bootstrap sample and assign a weight \(k_i\) describing the number of times it occurs, so \(k_1 + \dots + k_n = n\)
Denote the space of compositions of \(n\) into at most \(n\) parts as \[\mathcal{C}_n = \{ \boldsymbol{k} = (k_1,\dots,k_n), k_1+\dots+k_n=n, k_i \ge 0, k_i \text{ integer} \}\]

The size of this space is \(|\mathcal{C}_n| = \binom{2n-1 }{n-1}\)
Place \(n-1\) bars inbetween \(n\) balls, for example, for \(n=3\)
- 0|0|0 corresponds to \(x_1,x_2,x_3\)
- 000|| corresponds to \(x_1,x_1,x_1\)
There will be \(2n-1\) positions from which to choose the \(n-1\) bars positions

Corresponds to a multinomial distribution \(m_n(k,p)\) of the vector \((k_1,k_2,...k_n)\) with each of the \(n\) categories equally likely \(p_i=\frac{1}{n}\)
- rolling a dice with \(k_1,k_2,\dots,k_n\) sides \(n\) times
- all sides have fixed and equal probability of sucess
- counts how many times each side comes up successfully
- multinomial distribution gives the probability of any particular combination of numbers of successes

To form an exhaustive bootstrap distribution of statistic \(T(\mathcal{C}_n)\), we need to compute
- \(|\mathcal{C}_n| = \binom{2n-1}{n-1}\) statistics and
- associated weights by evaluating \(m_n(k,p)\)
The shift from the space of resampled observations to \(\mathcal{C}_n\) gives substantial savings
For an example with \(n = 15\), the number of enumerations reduce from
\(15^{15} \approx 4.38 \times 10^{17}\) to \(\binom{29}{14} \approx 7.7 \times 10^7\)