Stanford University, Spring 2016, STATS 205

What Happened So Far

• We want to compute functionals $$\theta = T(F)$$, e.g.
• the mean: $$T(F) = \int x \, dF(x) = \int x \, f(x) \, dx$$
• the median: $$T(F) = F^{-1}(1/2)$$
• Parameter of interest $$\theta$$ is functional of unkown $$F$$
• Assumptions on $$f(x)$$ relatively weak, e.g. symmetric around zero
• We tested $$\theta = \theta_0$$ from a sample $$\boldsymbol{x} = \{ x_1,x_2,\dots,x_n \}$$ drawn from distribution $$F$$ by comparing the observed statistic $$s(\boldsymbol{x})$$ to the null distribution of test statistics
• We introduced estimators $$\widehat{\theta}$$ and confidence intervals from a sample
• We quantified the robustness of estimators $$\widehat{\theta}$$ to outliers

Background

• Computer more powerful, resampling procedure more widespread
• The bootstrap is another tool to measure error in an estimate or significance of a hypothesis
• A bootstrap sample is a sample from the original sample taken (with replacement)
• This works when the histogram of the sample is representative of the population
• In other words, the histogram of the sample resembles the pdf of the random variable
• Drawing sample from the histogram is then the same as drawing samples from the population
• Thus yields an estimate of the sampling distribution of the statistic $$s(\boldsymbol{x})$$ or estimate $$\widehat{\theta}$$

Plug-in Principle

• The data are drawn independently and identically from a unknown distribution $$F$$
• The bootstrap supposes that the empirical distribution $$F_n$$ is a good description of the unknown distribution $$F$$
• The plugin-in estimate of a parameter $$\theta = T(F)$$ is define as $\widehat{\theta} = T(F_n)$
• This way one can draw as many samples from $$F_n$$ to compute sample variability of all kinds of $$s(\boldsymbol{x}^*)$$, e.g. the sample mean, sample median
• This is done via choosing $$n$$ samples with replacement (you can pick the same sample multiple times)

Plug-in Principle

• We observe the empirical distribution $$F_n$$ $F_n(t) = \frac{1}{n} \sum_{i=1}^n I(x_i \le t)$
• In other words, the distribution that puts mass $$1/n$$ at each $$x_i$$
• Denote bootstrap sample from $$F_n$$ (sample with replacement) as $\boldsymbol{x}^* = \left[ x_1^*,x_2^*,\dots,x_n^* \right]^T$
• Estimate based on original observations $$\widehat{\theta}^* = s(\boldsymbol{x}^*)$$
• Denote a collection of $$B$$ bootstrap samples as $$\widehat{\theta}^*_1,\widehat{\theta}^*_2,\dots,\widehat{\theta}^*_B$$