Stanford University, Spring 2016, STATS 205

What Happened So Far

  • We want to compute functionals \(\theta = T(F)\), e.g.
    • the mean: \(T(F) = \int x \, dF(x) = \int x \, f(x) \, dx\)
    • the median: \(T(F) = F^{-1}(1/2)\)
  • Parameter of interest \(\theta\) is functional of unkown \(F\)
  • Assumptions on \(f(x)\) relatively weak, e.g. symmetric around zero
  • We tested \(\theta = \theta_0\) from a sample \(\boldsymbol{x} = \{ x_1,x_2,\dots,x_n \}\) drawn from distribution \(F\) by comparing the observed statistic \(s(\boldsymbol{x})\) to the null distribution of test statistics
  • We introduced estimators \(\widehat{\theta}\) and confidence intervals from a sample
  • We quantified the robustness of estimators \(\widehat{\theta}\) to outliers

Background

  • Computer more powerful, resampling procedure more widespread
  • The bootstrap is another tool to measure error in an estimate or significance of a hypothesis
  • A bootstrap sample is a sample from the original sample taken (with replacement)
  • This works when the histogram of the sample is representative of the population
  • In other words, the histogram of the sample resembles the pdf of the random variable
  • Drawing sample from the histogram is then the same as drawing samples from the population
  • Thus yields an estimate of the sampling distribution of the statistic \(s(\boldsymbol{x})\) or estimate \(\widehat{\theta}\)

Plug-in Principle

  • The data are drawn independently and identically from a unknown distribution \(F\)
  • The bootstrap supposes that the empirical distribution \(F_n\) is a good description of the unknown distribution \(F\)
  • The plugin-in estimate of a parameter \(\theta = T(F)\) is define as \[\widehat{\theta} = T(F_n)\]
  • This way one can draw as many samples from \(F_n\) to compute sample variability of all kinds of \(s(\boldsymbol{x}^*)\), e.g. the sample mean, sample median
  • This is done via choosing \(n\) samples with replacement (you can pick the same sample multiple times)

Plug-in Principle

  • We observe the empirical distribution \(F_n\) \[ F_n(t) = \frac{1}{n} \sum_{i=1}^n I(x_i \le t) \]
  • In other words, the distribution that puts mass \(1/n\) at each \(x_i\)
  • Denote bootstrap sample from \(F_n\) (sample with replacement) as \[\boldsymbol{x}^* = \left[ x_1^*,x_2^*,\dots,x_n^* \right]^T\]
  • Estimate based on original observations \(\widehat{\theta}^* = s(\boldsymbol{x}^*)\)
  • Denote a collection of \(B\) bootstrap samples as \(\widehat{\theta}^*_1,\widehat{\theta}^*_2,\dots,\widehat{\theta}^*_B\)

Percentile Confidence Intervals

  • Order bootstrap samples: \(\widehat{\theta}^*_{(1)} \le \widehat{\theta}^*_{(2)} \le \dots \le \widehat{\theta}^*_{(B)}\)
  • Let \(m = \alpha/2 \times B\) then
  • we get the approximate \((1-\alpha) \times 100\%\) confidence interval \[\left( \widehat{\theta}^*_{(m)},\widehat{\theta}^*_{(B-m)} \right)\]

Paired Problem

  • Consider the paired problem with \(d_1,\dots,d_n\) are the differences
  • In the bootstrap sample \(d_i\) and \(-d_i\) each have probability \(1/2n\) of being selected
  • This forms an estimate of the null distribution of the test statistic \(T\)
  • Then \[\text{p-value } = \frac{\#\{T_i^*\ge T_0\}}{B}\]

One Sample Location Problem

  • Test the hypothesis: \[ H_0: \theta = \theta_0 \text{ versus } \theta > \theta_0 \]
  • We have to make sure that the null hypothesis is true, so we take our bootstrap samples from \[x_1-\widehat{\theta}+\theta_0,\dots,x_n-\widehat{\theta}+\theta_0\]
  • Then \[\text{p-value } = \frac{\#\{\widehat{\theta}_i^*\ge \widehat{\theta}\}}{B}\]

Two Sources of Errors

  • Sampling variability:
    We only have a sample of size \(n\) and not the entire population
  • Bootstrap resampling variability:
    We only use \(B\) bootstrap samples rather than an infinite number

Two Sources of Errors

  • The bootstrap estimate of a standard error \(\widehat{\operatorname{se}}_B\) of a statistic \(s\) \[\widehat{\operatorname{se}}_B = \left( \frac{1}{B} \sum_{b=1}^B (s(\boldsymbol{x}^{*b})-\bar{s})^2 \right)^{1/2}\]
  • when \(s\) is the sample mean of iid normals, then we get a coeffiecent of variation \[\operatorname{cv}(\widehat{\operatorname{se}}_B) = \frac{\operatorname{Var}(\widehat{\operatorname{se}}_B)}{\operatorname{E}(\widehat{\operatorname{se}}_B)}\]
  • as a function of \(n\) and \(B\) \[\operatorname{cv}(\widehat{\operatorname{se}}_B) = \left( \frac{1}{2n} + \frac{1}{2B} \right)^{1/2}\]

Complete Enumerations

  • There are \(n^n\) different bootstrap samples \[\boldsymbol{x}^{*1},\boldsymbol{x}^{*2},\dots,\boldsymbol{x}^{*n^n}\] but some of them have the same subset, for example, for \(n =3\)
    • \(x_1,x_1,x_2\) same as
    • \(x_2,x_1,x_1\)
  • We group the same bootstrap sample and assign a weight \(k_i\) describing the number of times it occurs, so \(k_1 + \dots + k_n = n\)
  • Denote the space of compositions of \(n\) into at most \(n\) parts as \[\mathcal{C}_n = \{ \boldsymbol{k} = (k_1,\dots,k_n), k_1+\dots+k_n=n, k_i \ge 0, k_i \text{ integer} \}\]

Complete Enumerations

  • The size of this space is \(|\mathcal{C}_n| = \binom{2n-1 }{n-1}\)
  • Place \(n-1\) bars inbetween \(n\) balls, for example, for \(n=3\)
    • 0|0|0 corresponds to \(x_1,x_2,x_3\)
    • 000|| corresponds to \(x_1,x_1,x_1\)
  • There will be \(2n-1\) positions from which to choose the \(n-1\) bars positions

Complete Enumerations

  • Corresponds to a multinomial distribution \(m_n(k,p)\) of the vector \((k_1,k_2,...k_n)\) with each of the \(n\) categories equally likely \(p_i=\frac{1}{n}\)
    • rolling a dice with \(k_1,k_2,\dots,k_n\) sides \(n\) times
    • all sides have fixed and equal probability of sucess
    • counts how many times each side comes up successfully
    • multinomial distribution gives the probability of any particular combination of numbers of successes

Complete Enumerations

  • To form an exhaustive bootstrap distribution of statistic \(T(\mathcal{C}_n)\), we need to compute
    • \(|\mathcal{C}_n| = \binom{2n-1}{n-1}\) statistics and
    • associated weights by evaluating \(m_n(k,p)\)
  • The shift from the space of resampled observations to \(\mathcal{C}_n\) gives substantial savings
  • For an example with \(n = 15\), the number of enumerations reduce from
    \(15^{15} \approx 4.38 \times 10^{17}\) to \(\binom{29}{14} \approx 7.7 \times 10^7\)