Homework 2

Exercise 1

Which is the most likely bootstrap sample and its associated probability in \(n = 3\), \(n = 5\), \(n = 10\), \(n = 12\), \(n = 15\), and \(n = 20\)?

Exercise 2

library(ggplot2)
library(bootstrap)
data(law)
law2 = data.frame(Observation = 1:dim(law)[1],law)
ggplot(law2, aes(x = LSAT, y = GPA)) + 
  geom_text(aes(label = Observation),hjust = 0,vjust = 0)

Which observation(s) do you need to remove from the sample to make the Monte Carlo simulation and complete enumerations look more similar? Recompute the complete enumerations bootstrap after removing your observation(s). Did anything change in the histogram compared to the one we had in class? If yes, explain what happened and why. Do all the computations and plotting in a R markdown file. To speedup computations, you may want to implement Gray codes for compositions (for details of the implementation have a look at Wilf and Klingsberg). How much speedup can you get by using Gray codes? Show either experimentally or theoretically.

Exercise 3

From our textbook (Stanford library link):
Suppose in a poll of \(500\) registered voters, \(269\) responded that they would vote for candidate \(P\). Obtain a \(90\%\) percentile bootstrap confidence interval for the true proportion of registered voters who plan to vote for \(P\).

Deadline

Please email your solutions in the form of a Rmd file to Nan Bi and Lexi Guan.

This homework is due on Friday, April 15th at 1:30 pm.

You are encouraged to work through this homework with your peers. But write your own code and explain the results in your own words.