Processing math: 44%

Chi-Squared Tests (Part 2)

Christof Seiler

Stanford University, Spring 2016, STATS 205

Test of Independence

  • Observations are two measurements on the same subject (e.g. eye color and hair color)
  • Random samples (X1,Y1),…,(Xn,Yn)
  • X and Y with different ranges {1,2,…,I} and {1,2,…,J}
  • Consider hyothesis test: H0:P(X=i,Y=j)=P(X=i)P(Y=j) for all i and jHA:P(X=i,Y=j)≠P(X=i)P(Y=j) for some i and j
  • Construct contingency table of n observations Oij=#1≤l≤n{(Xl,Yl)=(i,j)}

Test of Independence

Some definitions:

Row sum:Oi+Column sum:O+jTotal sum:O++Row vector i:OiColumn vector j:Oj

Test of Independence

Expected frequencies under independence assumption are product of marginals ˆEij=O++Oi+O++O+jO++

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    68   20    15     5
##   Brown   119   84    54    29
##   Red      26   17    14    14
##   Blond     7   94    10    16

Test statistics χ2=∑i∑j(Oij−ˆEij)2ˆEij

Asymptotically χ2 with (I−1)(J−1) degrees of freedom

Test of Independence

  • This is a goodness-of-fit test
  • Quantifies how well the independence model fits observations
  • Measures discrepancy between observed values and expected values under the model
  • Rather than reject this global null hypthesis, can we find what is driving the statistic?
  • Can we visualize this?
  • Two ways:
    • Visualization of contingency table with association and mosaic plots
    • Visualization of deviation from model with correspondence analysis

Hair and Eye Color of Statistics Students

Survey of students at the University of Delaware reported by Snee (1974)

HairEye
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    68   20    15     5
##   Brown   119   84    54    29
##   Red      26   17    14    14
##   Blond     7   94    10    16

Rows hair color of 592 statistics students
Columns eye color of the same 592 statistics students

Hair and Eye Color of Statistics Students

chisq.test(HairEye)
## 
##  Pearson's Chi-squared test
## 
## data:  HairEye
## X-squared = 138.29, df = 9, p-value < 2.2e-16

Association Plots

  • This is a followup on χ2 tests when they are rejected
  • Pearson residuals rij=Oij−Eij√Eij
  • χ2 statistics is squaring and summing over all cells χ2=∑j∑ir2ij

Association Plots

  • Each cell of contingency table is represented by a rectangle encoding information in width, height, location, and color of the rectangle
  • Height: Proportional to Pearson residual Oij−Eij√Eij
  • Width: Proportional to √Eij
  • Area: Proportional to Oij−Eij
  • Baseline:
    • If Oij>Eij, then rectanble above
    • otherwise rectangle below
  • Color: Standardized Pearson residuals that are asymptotically standard normal

Association Plots

Mosaic Plots

  • Cell frequency Oij, cell probability pij=Oij/O++
  • Take unit square
  • Divide unit square vertically into rectangles
    • Height proportional to observed marginal frequencies Oi+
    • which is proportional to marginal probabilities pi=Oi+/O++
  • Subdivide resulting rectangles horizontally
    • Width proportional to pj|i=Oij/Oi+
    • which is the conditional probabilities of the second variable given the first
  • Area is proportional to the observed cell frequency and probability pij=pi×pj|i=(Oi+/O++)×(Oij/Oi+)

Mosaic Plots

  • The order of conditioning matters
  • Color: Standardized Pearson residuals that are asymptotically standard normal

Mosaic Plots

Marginal probabilities of first variable

frequency = rowSums(HairEye)
proportions = frequency/sum(frequency)
data.frame(frequency=round(frequency,0),proportions=round(proportions,2))
##       frequency proportions
## Black       108        0.18
## Brown       286        0.48
## Red          71        0.12
## Blond       127        0.21

Mosaic Plots

Marginal probabilities of second given first variable

HairEye/rowSums(HairEye)
##        Eye
## Hair         Brown       Blue      Hazel      Green
##   Black 0.62962963 0.18518519 0.13888889 0.04629630
##   Brown 0.41608392 0.29370629 0.18881119 0.10139860
##   Red   0.36619718 0.23943662 0.19718310 0.19718310
##   Blond 0.05511811 0.74015748 0.07874016 0.12598425

Mosaic Plots

Mosaic Plots

If hair color and eye color were independent pij=pi×pj, then then the tiles in each row would all align

Correspondence Analysis

  • Exploratory data analysis for non-negative data matrices
  • Converts matrix into a plot where rows and columns are depicted as points
  • Its algebraic form first appeared in 1935
  • Actively developped in 1965 in France
  • We focus on the application of correspondence analysis for contingency tables
  • Analogous to principal components analysis, but appropriate to categorical rather than to continuous variables

Correspondence Analysis

  • Define a distance between rows
  • Use the distance d_{\chi^2}(\boldsymbol{x},\boldsymbol{y}) between two vector \boldsymbol{x} and \boldsymbol{y} as d_{\chi^2}(\boldsymbol{x},\boldsymbol{y}) = (\boldsymbol{x}-\boldsymbol{y})^T D_c^{-1} (\boldsymbol{x}-\boldsymbol{y})
  • where D_c is a diagonal matrix with \operatorname{diag}(D_c) = \boldsymbol{c} = \sum_i p_i \boldsymbol{a_i}
  • with \boldsymbol{a_i} = (p_{j|i}) being the probability vector conditioned on row i
  • and the centroid \boldsymbol{c} being the weighted average of conditional row probabilities

Greenacre and Hastie (1987)
The Geometric Interpretation of Correspondence Analysis

Correspondence Analysis

  • Compare d_{\chi^2}(\boldsymbol{x},\boldsymbol{y}) metric d_{\chi^2}(\boldsymbol{x},\boldsymbol{y}) = (\boldsymbol{x}-\boldsymbol{y})^T D_c^{-1} (\boldsymbol{x}-\boldsymbol{y})
  • to \chi^2 statistic \chi^2 = O_{++} \sum_i p_i (\boldsymbol{a_i}-\boldsymbol{c})^T D_c^{-1} (\boldsymbol{a_i}-\boldsymbol{c})
  • \chi^2/O_{++} is weighted average of d_{\chi^2}(\boldsymbol{a_i},\boldsymbol{c}) distances of conditional row probabilities \boldsymbol{a_i} to their row centroid \boldsymbol{c}

Correspondence Analysis

  • First "principle component" is best line fit trough cloud of conditional row probabilities
  • Modify standard Principal Component Analysis (PCA) to incorporate point weights and weighted metric
  • Use p_i as weights, which provides decomposition of \chi^2 statistics into components
  • After translating points to the origin \boldsymbol{c}, the best fit line is the principle eigenvector of matrix \sum_i p_i (\boldsymbol{a_i} - \boldsymbol{c}) (\boldsymbol{a_i} - \boldsymbol{c})^T D_c^{-1}
  • The trace of this matrix is equal to \chi^2/O_{++}

Greenacre and Hastie (1987)
The Geometric Interpretation of Correspondence Analysis

Correspondence Analysis

  • Repeat same for column marginal proportions
  • Can be done in one step by Generalized Singular Value Decomposition (GSVD) D_r^{-1/2} \left( \left( p_{ij} \right) - \boldsymbol{r}\boldsymbol{c}^T \right) D_c^{-1/2} = X D_{\alpha} Y^T with constrains X^TX = Y^TY = \operatorname{Id}
  • Singular values are the square roots of the principal inertias D_{\alpha} = D_{\lambda}^{1/2}
  • Principal axes of row D_r^{1/2}Y
  • Principal axes of column D_c^{1/2}X

Correspondence Analysis

  • Row and column projection are usually displayed in same plot
  • Row-to-column distances are meaningless
  • In standard PCA, principle components explain variance
  • In CA, principle components explain devation from independence

Correspondence Analysis

Color code: Rows: Hair, Columns: Eye

Mosaic Plots

Mosaic Plots

Test of Independence and Test of Homogeneity

  • The null hypothesis of row-column independence: p_{ij} = p_i \times p_j
  • is equivalent to the hypothesis of homogeneity of the rows: p_{j|i=1} = p_{j|i=2} = \dots = p_{j|i=I}
  • Each row of O follows a multinomial distribution
  • Under homogeneity assumption, this distribution has common probability vector
  • Maximum likelihood estimates of probability vector is equal to centroid \boldsymbol{c}
  • Thus a significant \chi^2 is a significant deviation of rows from their centroid

Test of Independence and Test of Homogeneity

  • Several discrete random variables with same categories 1,\dots,I
    • Test of homogeneity
    • Are the observations generated from one multinomial distribution with common probability vector?
    • Possibly different sample size among random variables
    • E.g. crime example, types of crimes (theft, fraud, etc.) commited by alcoholic and non-alcoholic criminals
  • Two discrete random variables with different categories 1,\dots,I and 1,\dots,J
    • Test of independence
    • Are the pairwise observations independent?
    • Paired random variables, so same sample size
    • E.g. hair color, eye color example