library(tidyverse)
library(tidymodels)
Lab 7
By the end of this lab you will construct and interpret confidence intervals via bootstrapping.
For all visualizations you create, be sure to include informative titles for the plot, axes, and legend!
Getting started
Go to Duke Container Manager and create a new folder titled lab-7 - Hypothesis testing.
Download the files from Canvas and upload them into the new folder you created in step 1.
Complete the exercises in this document.
Part 1: Lemurs
This week we’ll be working with data from the Duke Lemur Center, which houses over 200 lemurs across 14 species – the most diverse population of lemurs on Earth, outside their native Madagascar.
Lemurs are the most threatened group of mammals on the planet, and 95% of lemur species are at risk of extinction. Our mission is to learn everything we can about lemurs – because the more we learn, the better we can work to save them from extinction. They are endemic only to Madagascar, so it’s essentially a one-shot deal: once lemurs are gone from Madagascar, they are gone from the wild.
By studying the variables that most affect their health, reproduction, and social dynamics, the Duke Lemur Center learns how to most effectively focus their conservation efforts. And the more we learn about lemurs, the better we can educate the public around the world about just how amazing these animals are, why they need to be protected, and how each and every one of us can make a difference in their survival.
Source: TidyTuesday
The codebook for the deataset can be found here. While the TidyTuesday project used the full dataset, we’ll be working with a subset.
First let’s load the packages.
Exercise 1
Load the lemurs
data save it as lemurs
. Then, report which “types” of lemurs are represented in the sample and how many of each. Note that this information is in the taxon
variable. You should refer back to the linked data dictionary to understand what the different values of taxon
mean.
Now that you’ve completed your first exercise, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.
Exercise 2
What is the mean weight of ring-tailed lemurs? Calculate, visualize, and interpret a 95% bootstrap confidence interval together with your point estimate. Use set.seed(123)
and reps = 1000
to create your bootstrap confidence interval.
Hints:
Below is a step-by-step recipe for constructing and visualizing a confidence interval. The code snippets shown are not “complete”, they’re intended to guide you in the right direction.
Step 1: Create a dataset of ring-tailed lemurs, and call it
lemurs_rt
. Note, these aretaxon == "LCAT"
.Step 2: Calculate the mean weight of ring-tailed lemurs.
<- lemurs_rt |>
obs_stat_mean_rt specify(response = weight_g) |>
calculate(stat = "mean")
- Step 3: Construct a bootstrap distribution. Note: There is a
generate()
step, so you also need to set a seed before running the following.
<- lemurs_rt |>
boot_dist_mean_rt specify(response = weight_g) |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "mean")
- Step 4: Calculate the bounds of the confidence interval. Don’t forget to interpret it as well!
<- boot_dist_mean_rt |>
ci_95_mean_rt get_confidence_interval(
point_estimate = obs_stat_mean_rt,
level = 0.95
)
- Step 5: Visualize the confidence interval, overlaying it on the bootstrap distribution.
visualize(boot_dist_mean_rt) +
shade_confidence_interval(endpoints = ci_95_mean_rt)
Exercise 3
What is the median weight of ring-tailed lemurs? What is the median weight of mongoose lemurs? Report a 95% bootstrap confidence interval together with your point estimates. Use set.seed(123)
and reps = 10000
to create your bootstrap confidence intervals.
Exercise 4
Your friend has never taken a statistics course. Describe to your friend the process of “bootstrap sampling” in the previous exercise to create a bootstrap confidence interval in your own words. Walk your friend through the process of collecting a bootstrap sample, computing a statistic and then repeating). You may find it helpful to describe it as drawing pieces of paper from a bag like we did in class.
Hint: What would you write on the slips of paper? How many pieces of paper would be in the bag(s)? How many bag(s) would you need for the above exercise?
Exercise 5
Do female lemurs differ in weight from male lemurs? Report the difference in mean weights between groups. Next, construct a 99% bootstrap distribution for the difference in means between groups and interpret it. If the interval covers 0, you might think there is no difference between groups. Does the interval cover 0? Use set.seed(123)
and reps = 10000.
Exercise 6
Create a new column that tells you whether or not a lemur was born in the first or second half of the year (January through June vs July through December). Create a meaningful plot to illustrate the weights of each group of lemurs. Use theme_bw()
to replace the default gray plot background. Based on your figure, which group of lemurs weighs more? Is the distribution of one or both groups skewed or symmetric? See here for more information about skew vs symmetric distributions.
Exercise 7
Is there more variability in weight for lemurs born in the first half or second half of the year? Report the estimated standard deviation of each group. Report a 90% bootstrap confidence interval of s.d. for each group. Interpret the confidence intervals in context. Use set.seed(123)
and remember to set reps = 10000
.
Now that you’ve completed a few more exercises, pause and render your document. If it renders without any issues, great! Move on to the next exercise. If it does not, debug the issue before moving on. Ask for help from your TA if you need. Do not proceed without rendering your document.
Part 2: IMS Exercises
The exercises in this section do not require code. Make sure to answer the questions in full sentences.
Exercise 8
IMS - Chapter 12 exercises, #2: Chronic illness.
Exercise 9
IMS - Chapter 12 exercises, #6: Bootstrap distributions of \(\hat{p}\), III.
Exercise 10
IMS - Chapter 12 exercises, #8: Waiting at an ER.
Wrap up
Submitting
Before you proceed, first, make sure that you have updated the document YAML with your name! Then, render your document one last time, for good measure.
To submit your assignment to Gradescope:
Go to your Files pane and check the box next to the PDF output of your document (
lab-7.pdf
).Then, in the Files pane, go to More > Export. This will download the PDF file to your computer. Save it somewhere you can easily locate, e.g., your Downloads folder or your Desktop.
Go to the course Canvas page and click on Gradescope and then click on the assignment. You’ll be prompted to submit it.
Mark the pages associated with each exercise. All of the papers of your lab should be associated with at least one question (i.e., should be “checked”).
If you fail to mark the pages associated with an exercise, that exercise won’t be graded. This means, if you fail to mark the pages for all exercises, you will receive a 0 on the assignment. The TAs can’t mark your pages for you, and for them to be able to grade, you must mark them.
Grading
Exercise | Points |
---|---|
Exercise 1 | 3 |
Exercise 2 | 6 |
Exercise 3 | 7 |
Exercise 4 | 5 |
Exercise 5 | 5 |
Exercise 6 | 7 |
Exercise 7 | 8 |
Exercise 8 | 3 |
Exercise 9 | 3 |
Exercise 10 | 3 |
Total | 50 |