Tutorial illustration

Tutorial description

One of the fundamental tasks in science and other disciplines is using a sample of data to understand aspects of a larger population. The conclusions made about that population use a method referred to as statistical inference. Although seemingly backwards, the logic behind statistical inference is to reject a research claim which is not of interest (thereby validating the claim of interest). For example, in order to show that one diet is better than the other, you first reject the idea that the two diets are equally effective at causing weight loss.

Statistical inference plays a role in many different data analyses (including the linear models you’ve seen in previous tutorials). In this tutorial, we introduce the idea of a p-value which can be thought of as the degree of disagreement between the data and the research claim. We also introduce confidence intervals which provide a range of plausible values for the measure of interest (e.g., the number of additional pounds lost, on average, for the first diet as compared to the second diet).

Learning objectives

Lessons

1 - Sampling variability

The fundamental principle that there is variability across different samples will be investigated in this lesson. You will be able to characterize the variability across samples as compared to the underlying population. Remember, the goal of the research is to understand the population, while the information you have comes only from a single sample of data.

2 - Randomization test

Using the infer package, you will be able to work through an entire randomization test. For a given dataset and research question, you will identify hypotheses and decide whether or not it is appropriate to reject the null hypothesis in favor of the research claim of interest.

3 - Errors in hypothesis testing

Working through another complete hypothesis test (using the infer package), you will focus on how and when false conclusions are sometimes drawn. You will learn to distinguish between a Type I error where you are confident in a claim that isn’t valid and a Type II error where you fail to conclude anything when your science is correct. Additionally, you will learn the important role that number of observations play in reducing error rates.

4 - Parameters and confidence intervals

Additionally, using the infer package, you will learn to create confidence intervals for a single population proportion. A confidence interval will provide a range of plausible values to estimate the unknown population value. The parameter is the true unknown proportion of successes in the population, estimated using the sample proportion. Bootstrapping allows for estimation of the variability of the sample proportion which leads to an interval estimate.

Instructor

Jo Hardin

Jo Hardin

Pomona College

Jo Hardin is Professor of Mathematics and Statistics at Pomona College. She collaborates with molecular biologists to create novel statistical methods for analyzing high throughput data. She has also worked extensively in statistics and data science education, facilitating modern curricula for higher education instructors. She was a co-author on the 2014 ASA Curriculum Guidelines for Undergraduate Programs in Statistical Science, and she writes on the blog teachdatascience.com. The best part of her job is collaborating with undergraduate students. In her spare time, she loves reading, running, and breeding tortoises.