Teaching - Stats 205

Introduction to Nonparametric Statistics

Description

Nonparametric analogs of the one- and two-sample t-tests and analysis of variance; the sign test, median test, Wilcoxon’s tests, and the Kruskal-Wallis and Friedman tests, tests of independence. Nonparametric regression and nonparametric density estimation, modern nonparametric techniques, nonparametric confidence interval estimates.

Textbooks

Required

  • KM: Kloke and McKean (2015). Nonparametric Statistical Methods Using R

Optional

  • ET: Efron and Tibshirani (1994). An Introduction to the Bootstrap
  • FM: Friendly and Meyer (2015). Discrete Data Analysis with R
  • G: Good (2005). Permutations, Parametric, and Boostrap Test of Hypothesis
  • Ha: Hall (1992). The Bootstrap and Edgeworth Expansion
  • HP: Henderson and Parmeter (2015). Applied Nonparametric Econometrics
  • HMT: Hoaglin, Mosteller, and Tukey (1983). Understanding Robust and Exploratory Data Analysis
  • HWC: Hollander, Wolfe, and Chicken (2013). Nonparametric Statistical Methods
  • Lo: Lovasz (2012). Large Networks and Graph Limits
  • L: Lehmann (2006). Nonparametrics Statistical Methods Based on Ranks
  • MQJH: Müller, Quintana, Jara, and Hanson (2015). Bayesian Nonparametric Data Analysis
  • PE: Patrangenaru and Ellingson (2015). Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis
  • RW: Rasmussen and Williams (2006). Gaussian Processes for Machine Learning
  • S: Silverman (1986). Density Estimation for Statistics and Data Analysis
  • T: Tsybakov (2009). Introduction to Nonparametric Estimation
  • W: Wasserman (2006). All of Nonparametric Statistics

Suggested Journal Articles

  • B: Basu (1980). Randomization Analysis of Experimential Data: The Fisher Randomization Test
  • Bh: Bhattacharya (2015). Power of Graph-Based Two-Sample Tests
  • BHLLSW: Buja, Cook, Hofmann, Lawrence, Lee, Swayne, and Wickham (2009). Statistical Inference for Exploratory Data Analysis and Model Diagnostics
  • CB: Carpenter and Bithell (2000). Bootstrap Conidence Intervals: When, Which, What? A Practical Guide for Medical Statisticians
  • Ch: Chatterjee (2012). Matrix Estimation by Universal Singular Value Thresholding
  • C: Critchlow (1986), A Unified Approach to Constructing Nonparametric Rank Tests
  • D83: Diaconis (1983). Theories of Data Analysis: From Magical Thinking Through Classical Statistics
  • DE: Diaconis and Efron (1983). Computer Intensive Methods in Statistics
  • DHa: Diaconis and Holmes (1994). Gray Codes for Randomization Procedures
  • DHb: Diaconis and Holmes (1994). Three Examples of the Markov Chain Monte Carlo Method
  • Ef: Efron (1987). Better Bootstrap Confidence Intervals
  • Ef14: Efron (2014). Estimation and Accuracy after Model Selection
  • ENK: Eklund, Nichols, and Knutsson (2015). Can Parametric Statistical Methods Be Trusted for fMRI Based Group Studies?
  • FR: Friedman and Rafsky (1979). Multivariate Generalizations of the Wolfowitz and Smirnov Two-Sample Tests
  • F: Friendly (1994). Mosaic Displays for Multi-Way Contingency Tables
  • GH: Greenacre and Hastie (1987). The Geometric Interpretation of Correspondence Analysis
  • H: Holmes (2008). Multivariate Data Analysis: The French Way
  • JWH: Josse, Wager, and Husson (2014). Confidence Areas forFixed-Effects PCA
  • NWC: Nahhas, Wolfe, and Chen (2002). Ranked Set Sampling: Cost and Optimal Set Size
  • MRR: Marin, Pudlo, Robert, and Ryder (2012). Approximate Bayesian Computational Methods
  • MW: Milan and Whittaker (1995). Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition
  • NH: Nichols and Holmes (2001). Nonparametric Permutation Tests For Functional Neuroimaging: A Primer with Examples
  • PT: Pagano and Tritchler (1983). On Obtaining Permutation Distributions in Polynomial Time
  • RH: Rousseeuw and Hubert (2015). Statistical Depth Meets Computational Geometry: A Short Survey
  • SMN: Silver, Montana, and Nichols (2011). False Positives in Neuroimaging Genetics Using Voxel-Based Morphometry Data
  • Tu: Tukey (1974). Mathematics and the Picturing of Data
  • WHE: Wager, Hastie, and Efron (2014). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife
  • WRR: Witztum, Rips, and Rosenberg (1994). Equidistant Letter Sequences in the Book of Genesis

Instructor

Christof Seiler, Sequoia Hall 116 (christof.seiler [at] stanford [dot] edu)
Office hours: Wednesdays from 10:00 to 11:30 am in 105 at Sequoia

TA’s

  • Nan Bi (nbi [at] stanford [dot] edu)
    Office hours:
    • Wednesdays from 2:30 to 3:30 pm in 420-147
    • Thursdays from 10:30 to 11:30 am in Fishbowl at Sequoia
  • Lexi Guan (lguan [at] stanford [dot] edu)
    Office hours:
    • Monday from 10:00 to 11:00 am in Bowker at Sequoia
    • Friday from 4:00 to 5:00 pm in Fishbowl at Sequoia

Grading

  • Midterm Project Proposal (3 pages with references, 10%, due by April 29th)
  • Final Project (12 pages plus references, 40% due by June 3rd)
  • Projects can be done alone or in pairs
  • Weekly homework (40%)
  • Class participation (10%)

Midterm Project Content

Some optional guidelines:

  1. State the problem
  2. Describe the data
  3. Review what statistical methods are available to analyze your data
  4. List their advantages and disadvantages, in particular compare nonparametric to parameteric methods
  5. Propose a solution using nonparametric methods
  6. List all the tasks that you plan to do: collecting data, programming, simulating data, estimating, testing, etc.

Final Project

Example of an excellent final project on using kernel density estimation to predict conflicts in the Congo:

Slides

Lecture Topic(s) Background Material
1 Logistics and Introduction KM Chapter 1
2 Sign Test and Signed-Rank Wilcoxon KM Chapters 2.1, 2.2, and 2.3
3 Robustness KM Chapter 2.5
4 Bootstrap (Part 1) and Bootstrap (Example) KM Chapter 2.4
5 Bootstrap (Part 2) KM Chapter 2.4
6 Proportion Problems and \( \chi^2 \) Tests (Part 1) KM Chapters 2.6 and 2.7
7 \( \chi^2 \) Tests (Part 2) KM Chapter 2.7
8 Two-Sample Problems (Part 1) KM Chapters 3.1 and 3.2
9 Two-Sample Problems (Part 2) KM Chapters 3.2 and 3.4
10 Permutation Tests (Part 1) G Chapter 1
11 Permutation Tests (Part 2) and Neuroimaging (Example) WRR and NH
12 Rank-Based Linear Regression KM Chapters 4.1, 4.2, 4.3, 4.4, and 4.8
13 Nonlinear Regression (Part 1) W Chapter 4
14 Nonlinear Regression (Part 2) W Chapter 5
15 Nonlinear Regression (Part 3) W Chapter 5
16 Bayesian Nonparametrics (Part 1) W16
17 Bayesian Nonparametrics (Part 2) and BNP in Practice W16
18 ANOVA KM Chapters 5.1, 5.2, 8.1, 8.2, HMT, and G16
19 Survival Analysis (Part 1) KM Chapters 6.1 and 6.2
20 Survival Analysis (Part 2) and Midterm Proposal Discussion KM Chapters 6.1 and 6.2
21 Ranked Set Sampling HWC Chapter 15 and NWC
22 Wavelets HWC Chapter 13 and W Chapter 9
23 Graph Limits or Graphons Lo Part 1 and Ch
24 Inference for Data Visualization D83, BHLLSW, and JWH
25 Multivariate Nonparametric Tests Tu, Ho, RH, and FR
26 Bootstrap (Part 3) ET Chapters 12 and Ha, and Lo
27 Bootstrap (Part 4) ET Chapters 14 and Ha
28 Wrapup  

Homework

Assignment Deadline Solution
Homework 1 April 7th at 1:30 pm Solution 1
Homework 2 April 15th at 1:30 pm Solution 2
Homework 3 April 25th at 1:30 pm Solution 3
Homework 4 May 10th at 1:30 pm Solution 4
Homework 5 May 19th at 1:30 pm Solution 5
Homework 6 May 27th at 1:30 pm Solution 6

Late Homework Policy

  • We will deduct 20% from maximum scores for each late day
  • Each student can hand in one homework late (within two days after the deadline)
  • Please contact me in case of emergencies