Stanford University, Spring 2016, STATS 205

What Happened So Far

  • We want to compute functionals \(\theta = T(F)\), e.g.
    • the mean: \(T(F) = \int x \, dF(x) = \int x \, f(x) \, dx\)
    • the median: \(T(F) = F^{-1}(1/2)\)
  • Parameter of interest \(\theta\) is functional of unkown \(F\)
  • Assumptions on \(f(x)\) relatively weak, e.g. symmetric around zero
  • We tested \(\theta = \theta_0\) from a sample \(\boldsymbol{x} = \{ x_1,x_2,\dots,x_n \}\) drawn from distribution \(F\) by comparing the observed statistic \(s(\boldsymbol{x})\) to the null distribution of test statistics
  • We introduced estimators \(\widehat{\theta}\) and confidence intervals from a sample
  • We quantified the robustness of estimators \(\widehat{\theta}\) to outliers

Background

  • Computer more powerful, resampling procedure more widespread
  • The bootstrap is another tool to measure error in an estimate or significance of a hypothesis
  • A bootstrap sample is a sample from the original sample taken (with replacement)
  • This works when the histogram of the sample is representative of the population
  • In other words, the histogram of the sample resembles the pdf of the random variable
  • Drawing sample from the histogram is then the same as drawing samples from the population
  • Thus yields an estimate of the sampling distribution of the statistic \(s(\boldsymbol{x})\) or estimate \(\widehat{\theta}\)

Plug-in Principle

  • The data are drawn independently and identically from a unknown distribution \(F\)
  • The bootstrap supposes that the empirical distribution \(F_n\) is a good description of the unknown distribution \(F\)
  • The plugin-in estimate of a parameter \(\theta = T(F)\) is define as \[\widehat{\theta} = T(F_n)\]
  • This way one can draw as many samples from \(F_n\) to compute sample variability of all kinds of \(s(\boldsymbol{x}^*)\), e.g. the sample mean, sample median
  • This is done via choosing \(n\) samples with replacement (you can pick the same sample multiple times)

Plug-in Principle

  • We observe the empirical distribution \(F_n\) \[ F_n(t) = \frac{1}{n} \sum_{i=1}^n I(x_i \le t) \]
  • In other words, the distribution that puts mass \(1/n\) at each \(x_i\)
  • Denote bootstrap sample from \(F_n\) (sample with replacement) as \[\boldsymbol{x}^* = \left[ x_1^*,x_2^*,\dots,x_n^* \right]^T\]
  • Estimate based on original observations \(\widehat{\theta}^* = s(\boldsymbol{x}^*)\)
  • Denote a collection of \(B\) bootstrap samples as \(\widehat{\theta}^*_1,\widehat{\theta}^*_2,\dots,\widehat{\theta}^*_B\)