Stanford University, Spring 2016, STATS 205

Robustness of Estimators

  • Robustness properties of the three estimators the mean, the median, and the Hodges–Lehmann estimate.
  • Main measures of robustness:
    • efficiency,
    • influence,
    • and breakdown
  • Today, we focus on influence and breakdown

Sensitivity Curve

  • Function of the observations
  • The parameter \(\theta\) is a function of the cdf \(F\)
  • The observations \(x_1,x_2,\dots,x_n\) are drawn from \(F\)
  • Denote \(\widehat{\theta}\) as the estimator of \(\theta\) in the sample
  • Measure change in estimator \(\widehat{\theta}_n\) when outlier \(x\) is added to a sample \(x_n\) \[ x_n = ( x_1,x_2,\dots,x_n )^T \] \[ x_{n+1} = ( x_1,x_2,\dots,x_n,x )^T \]
  • The sensitivity curve of an estimator is \[ S(x,\widehat{\theta}) = \frac{\widehat{\theta}_{n+1} - \widehat{\theta}_n}{1/(n+1)} \]

Sensitivity Curve Example

x_n = c(1.85,2.35,-3.85,-5.25,-0.15,2.15,0.15,-0.25,-0.55,2.65)
mean(x_n)
## [1] -0.09
median(x_n)
## [1] 0
hl.loc(x_n)
## [1] 0

Sensitivity Curve Example

  • Mean is unbounded
  • Median and Hodges–Lehmann is bounded (estimators change slightly and changes soon become constant)

Influence Function

  • Sensitivity curve depends on sample \(x_1,x_2,\dots,x_n\)
  • Influence function depends on the population distribution \(F\)
  • It's a functional: a function that takes another function as its argument
  • Denote functional as \(T(F)\), e.g. median \(T(F) = F^{-1}(1/2)\)
  • Then using Gateaux derivatives (directional derivatives for functionals) \[ L_F(x) = lim_{\epsilon\to 0} \frac{T((1-\epsilon)F + \epsilon \delta_x)-T(F)}{\epsilon} \] with the point mass \(\delta_x\) corresponding to adding an outlier \(x\)

Influence Function

  • An estimator is robust if its influence function is bounded
  • The influence function for our estimators are (up to constant of proportionality and center)
    • Mean: \(x\),
    • Median: \(\operatorname{sign}(x)\), and
    • HL: \(F(x) − 0.5\)
  • Hence, mean is not robust, but median and HL are robust

Influence Function Example

Breakdown Point of an Estimator

  • Suppose we contaminate \(n-m\) points in our sample \[ \boldsymbol{x}_n^* = (x_1,\dots,x_m,x_{m+1}^*,\dots,x_n^*) \]
  • Consider the contamination to be as very large (close to \(\infty\))
  • Breaking point: the smallest value \(n-m\) so that \(\widehat{\theta}(\boldsymbol{x}_n^*)\) is meaningless
  • Finite sample breakdown point: The ratio \(\frac{n-m}{n}\)
  • Asymptotic breakdown point: If it converges to a finite value as \(n \to \infty\)

Breakdown Point of an Estimator

  • Sample mean:
    • Finite sample BP: \(\frac{1}{n}\)
    • Asymptotic BP: \(0\)
  • Sample median:
    • Finite sample BP: \(\operatorname{floor}\left( \frac{n − 1}{2} \right)\)
    • Asymptotic BP: \(0.5\)
  • Hodges–Lehmann estimator:
    • Asymptotic BP: \(0.29\)

Breakdown Point of an Estimator

The median of the Walsh averages, stays with the majority.

\[ \begin{align} \# \text{good points} & > \# \text{bad points} \\ \frac{m(m+1)}{2} & > \frac{n(n+1) - m(m+1)}{2} \\ m(m+1) & > \frac{n(n+1)}{2} \end{align} \]

Breakdown Point of an Estimator

  • Obtaining finite sample BP is messy, asymptotic is easier: \[m(m+1) > \frac{n(n+1)}{2} \Leftrightarrow \frac{m(m+1)}{n(n+1)} > \frac{1}{2}\]
  • let \(x = m/n\) and \(x \approx (m+1)/(n+1)\)
  • then \(x^2 > \frac{1}{2} \Leftrightarrow x > \frac{1}{\sqrt{2}}\)
  • convert to BP \(\frac{1}{\#bad} = 1 - \frac{1}{\#good}\): \[1-\frac{1}{\sqrt{2}} \approx 0.29\]
  • More details are here

Breakdown Point of an Estimator

  • Instead of testing for normality to decide whether to use a \(t\)-test or the Wilcoxon test
  • It is easier to use the notion of robustness of an estimator and their associated procedures
  • There are too many ways that a distribution can deviate from normality
  • How much robustness do we need? Is \(0.29\) enough or \(0.5\) really necessary?
  • Sample median preferred over Hodges–Lehmann estimator?
    • Based on breakdown point: yes
    • This however ignores estimator efficiency (we talk about this later)