From Good 2005 (Exercise 4.8, #20, page 78, Stanford library link):
Suppose the observations \((X_1,\dots,X_K)\) are distributed in accordance with the multivariate normal probability density \[ \frac{\sqrt{|D|}}{(2\pi)^{K/2}} \exp \left( -\frac{1}{2} \sum \sum d_{ij} (x_i-\mu_i)(x_j-\mu_j) \right) \] where the matrix \(D = (d_{ij})\) is positive definite; \(|D|\) denotes its determinant; \(\operatorname{E}(X_j) = \mu_j\); \(\operatorname{E}((X_i−\mu_i)(X_j−\mu_j)) = \sigma_{ij}\); and \((\sigma_{ij})= D^{−1}\). If \(\sigma_{ij} = \sigma^2\) when \(i = j\) and \(\sigma_{ij} = \sigma_{12}\) when \(i \ne j\), are the observations independent? Exchangeable?
From our textbook (Ex. 4.9.19) (Stanford library link):
Recall that, in general, the three measures of association estimate different parameters. Consider bivariate data \((X_i,Y_i)\) generated as follows:
\[Y_i = X_i + e_i, \hspace{0.5cm} i=1,2,\dots,n\]
where \(X_i\) has a standard Laplace (double exponential) distribution, \(e_i\) has a standard \(N(0,1)\) distribution, and \(X_i\) and \(e_i\) are independent.
rlaplace(n)
generates \(n\) iid Laplace variates. For \(n = 30\), compute such a bivariate sample. Then obtain the scatterplot and the association analyses based on the Pearson’s, Spearman’s, and Kendall’s procedures.From our textbook (Ex. 4.9.20) (Stanford library link):
The electronic memory game Simon was first introduced in the late 1970s. In the game there are four colored buttons which light up and produce a musical note. The device plays a sequence of light/note combinations and the goal is to play the sequence back by pressing the buttons. The game starts with one light/note and progressively adds one each time the player correctly recalls the sequence.
Suppose the game were played by a set of statistics students in two classes (time slots). Each student played the game twice and recorded his or her longest sequence. The results are in the dataset simon
in package npsm
.
Regression toward the mean is the phenomenon that if an observation is extreme on the first trial it will be closer to the average on the second trial. In other words, students that scored higher than average on the first trial would tend to score lower on the second trial and students who scored low on the first trial would tend to score higher on the second.
Rfit
). Use Wilcoxon scores. Also overlay the line \(y = x\).Show that the leave-one-out cross-validation \[\widehat{R}(h) = \frac{1}{n} \sum_{i=1}^n (Y_i - \widehat{r}_{(-i)}(x_i))^2 \hspace{1cm}\text{is equal to}\hspace{1cm} \widehat{R}(h) = \frac{1}{n} \sum_{i=1}^n \left( \frac{Y_i - \widehat{r}_n(x_i)}{1-L_{ii}} \right)^2.\] Let \(L_{ij} = \frac{K(x_i,x_j)}{\sum_{k=1}^n K(x_i,x_k)}.\) Let \(\widehat{r}_n(x)\) be the kernel regression estimator and \(\widehat{r}_{(-1)}(x)\) be the kernel regression estimator obtained by leaving out \((x_i,Y_i).\) Follow the following steps:
Please email your solutions in the form of a Rmd file to Nan Bi and Lexi Guan.
This homework is due on Tuesday, May 10th at 1:30 pm.
You are encouraged to work through this homework with your peers. But write your own code and explain the results in your own words.