- Robustness properties of the three estimators the mean, the median, and the Hodgesâ€“Lehmann estimate.
- Main measures of robustness:
- efficiency,
- influence,
- and breakdown

- Today, we focus on influence and breakdown

Stanford University, Spring 2016, STATS 205

- Robustness properties of the three estimators the mean, the median, and the Hodgesâ€“Lehmann estimate.
- Main measures of robustness:
- efficiency,
- influence,
- and breakdown

- Today, we focus on influence and breakdown

- Function of the observations
- The parameter \(\theta\) is a function of the cdf \(F\)
- The observations \(x_1,x_2,\dots,x_n\) are drawn from \(F\)
- Denote \(\widehat{\theta}\) as the estimator of \(\theta\) in the sample
- Measure change in estimator \(\widehat{\theta}_n\) when outlier \(x\) is added to a sample \(x_n\) \[ x_n = ( x_1,x_2,\dots,x_n )^T \] \[ x_{n+1} = ( x_1,x_2,\dots,x_n,x )^T \]
- The sensitivity curve of an estimator is \[ S(x,\widehat{\theta}) = \frac{\widehat{\theta}_{n+1} - \widehat{\theta}_n}{1/(n+1)} \]

x_n = c(1.85,2.35,-3.85,-5.25,-0.15,2.15,0.15,-0.25,-0.55,2.65) mean(x_n)

## [1] -0.09

median(x_n)

## [1] 0

hl.loc(x_n)

## [1] 0

- Mean is unbounded
- Median and Hodgesâ€“Lehmann is bounded (estimators change slightly and changes soon become constant)

- Sensitivity curve depends on sample \(x_1,x_2,\dots,x_n\)
- Influence function depends on the population distribution \(F\)
- It's a
*functional*: a function that takes another function as its argument - Denote functional as \(T(F)\), e.g. median \(T(F) = F^{-1}(1/2)\)
- Then using Gateaux derivatives (directional derivatives for functionals) \[ L_F(x) = lim_{\epsilon\to 0} \frac{T((1-\epsilon)F + \epsilon \delta_x)-T(F)}{\epsilon} \] with the point mass \(\delta_x\) corresponding to adding an outlier \(x\)

- An estimator is
*robust*if its influence function is bounded - The influence function for our estimators are (up to constant of proportionality and center)
- Mean: \(x\),
- Median: \(\operatorname{sign}(x)\), and
- HL: \(F(x) âˆ’ 0.5\)

- Hence, mean is not robust, but median and HL are robust

- Suppose we contaminate \(n-m\) points in our sample \[ \boldsymbol{x}_n^* = (x_1,\dots,x_m,x_{m+1}^*,\dots,x_n^*) \]
- Consider the contamination to be as very large (close to \(\infty\))
*Breaking point:*the smallest value \(n-m\) so that \(\widehat{\theta}(\boldsymbol{x}_n^*)\) is meaningless*Finite sample breakdown point:*The ratio \(\frac{n-m}{n}\)*Asymptotic breakdown point:*If it converges to a finite value as \(n \to \infty\)

- Sample mean:
- Finite sample BP: \(\frac{1}{n}\)
- Asymptotic BP: \(0\)

- Sample median:
- Finite sample BP: \(\operatorname{floor}\left( \frac{n âˆ’ 1}{2} \right)\)
- Asymptotic BP: \(0.5\)

- Hodgesâ€“Lehmann estimator:
- Asymptotic BP: \(0.29\)

The median of the Walsh averages, stays with the majority.

\[ \begin{align} \# \text{good points} & > \# \text{bad points} \\ \frac{m(m+1)}{2} & > \frac{n(n+1) - m(m+1)}{2} \\ m(m+1) & > \frac{n(n+1)}{2} \end{align} \]

- Obtaining finite sample BP is messy, asymptotic is easier: \[m(m+1) > \frac{n(n+1)}{2} \Leftrightarrow \frac{m(m+1)}{n(n+1)} > \frac{1}{2}\]
- let \(x = m/n\) and \(x \approx (m+1)/(n+1)\)
- then \(x^2 > \frac{1}{2} \Leftrightarrow x > \frac{1}{\sqrt{2}}\)
- convert to BP \(\frac{1}{\#bad} = 1 - \frac{1}{\#good}\): \[1-\frac{1}{\sqrt{2}} \approx 0.29\]
- More details are here

- Instead of testing for normality to decide whether to use a \(t\)-test or the Wilcoxon test
- It is easier to use the notion of robustness of an estimator and their associated procedures
- There are too many ways that a distribution can deviate from normality
- How much robustness do we need? Is \(0.29\) enough or \(0.5\) really necessary?
- Sample median preferred over Hodgesâ€“Lehmann estimator?
- Based on breakdown point: yes
- This however ignores estimator efficiency (we talk about this later)