Exercise 1

  1. Write R code to draw 100 sample curves from the Dirichlet Process (DP) prior for a range of concentration parameters \(\alpha\) and one fixed base distribution \(F_0\) (for example the normal distribution).
  2. What happens when you change \(\alpha\)?
  3. What does this corresponds to in a more standard dstribution, say the normal distribution?

Exercise 2

  1. Download occupational employment statistics from here.
  2. Construct a two-way table of median hourly wages with states as rows and job titles as columns.
  3. Perform the Friedman’s test on the constructed table. What is our null and alternative hypothesis? What are your conclusions?
  4. Perform median polish on this table. What are you conclusions? Do model fit evaluations using Tukey additivity plot. What are your conclusions?

Exercise 3

From our textbook (Ex. 6.5.3.) (Stanford library link):

For the dataset hodgkins in package npsm, plot Kaplan-Meier estimated survival curves for both treatments. Note treatment code 1 denotes radiation of affected node and treatment code 2 denotes total nodal radiation.

Exercise 4

From our textbook (Ex. 6.5.4.) (Stanford library link):

To simulate survival data, often it is useful to simulate multiple time points. For example the time to event and the time to end of study. Then, events occurring after the time to end of study are censored. Suppose the time to event of interest follows an exponential distribution with mean 5 years and the time to end of study follows an exponential distribution with a mean of 1.8 years. For a sample size \(n = 100\) simulate survival times from this model. Plot the Kaplan-Meier estimate.


Please email your solutions in the form of a Rmd file to Nan Bi and Lexi Guan.

This homework is due on Thursday, May 19th at 1:30 pm.

You are encouraged to work through this homework with your peers. But write your own code and explain the results in your own words.