Datasets

These are real datasets collected as part of surveys, polls, other observational studies, or experiments.

absenteeism

Absenteeism from school in New South Wales

acs12

American Community Survey, 2012

age_at_mar

Age at first marriage of 5,534 US women.

ames

Housing prices in Ames, Iowa

ami_occurrences

Acute Myocardial Infarction (Heart Attack) Events

antibiotics

Pre-existing conditions in 92 children

arbuthnot

Male and female births in London

ask

How important is it to ask pointed questions?

assortative_mating

Eye color of couples

avandia

Cardiovascular problems for two types of Diabetes medicines

babies_crawl

Crawling age

babies

The Child Health and Development Studies

bac

Beer and blood alcohol content

ball_bearing

Lifespan of ball bearings

bdims

Body measurements of 507 physically active individuals.

biontech_adolescents

Efficacy of Pfizer-BioNTech COVID-19 vaccine on adolescents

birds

Aircraft-Wildlife Collisions

births

North Carolina births, 100 cases

births14

US births

blizzard_salary

Blizzard Employee Voluntary Salary Info.

books

Sample of books on a shelf

burger

Burger preferences

cancer_in_dogs

Cancer in dogs

cards

Deck of cards

cars93

cars93

census

Random sample of 2000 U.S. Census Data

cherry

Summary information for 31 cherry trees

children_gender_stereo

Gender Stereotypes in 5-7 year old Children

china

Child care hours

cia_factbook

CIA Factbook Details on Countries

cle_sac

Cleveland and Sacramento

climate70

Temperature Summary Data, Geography Limited

climber_drugs

Climber Drugs Data.

coast_starlight

Coast Starlight Amtrak train

corr_match

Sample data sets for correlation problems

country_iso

Country ISO information

cpr

CPR data set

cpu

CPU's Released between 2010 and 2020.

credits

College credits.

daycare_fines

Daycare fines

diabetes2

Type 2 Diabetes Clinical Trial for Patients 10-17 Years Old

dream

Survey on views of the DREAM Act

drone_blades

Quadcopter Drone Blades

drug_use

Drug use of students and parents

duke_forest

Sale prices of houses in Duke Forest, Durham, NC

earthquakes

Earthquakes

ebola_survey

Survey on Ebola quarantine

elmhurst

Elmhurst College gift aid

email

Data frame representing information about a collection of emails

email50

Sample of 50 emails

env_regulation

American Adults on Regulation and Renewable Energy

epa2012

Vehicle info from the EPA for 2012

epa2021

Vehicle info from the EPA for 2021

esi

Environmental Sustainability Index 2005

ethanol

Ethanol Treatment for Tumors Experiment

evals

Professor evaluations and beauty

exam_grades

Exam and course grades for statistics students

exams

Exam scores

exclusive_relationship

Number of Exclusive Relationships

fact_opinion

Can Americans categorize facts and opinions?

family_college

Simulated sample of parent / teen college attendance

fastfood

Nutrition in fast food

fcid

Summary of male heights from USDA Food Commodity Intake Database

fheights

Female college student heights, in inches

fish_oil_18

Findings on n-3 Fatty Acid Supplement Health Benefits

flow_rates

River flow data

friday

Friday the 13th

full_body_scan

Poll about use of full-body airport scanners

gear_company

Fake data for a gear company example

gender_discrimination

Bank manager recommendations based on gender

get_it_dunn_run

Get it Dunn Run, Race Times

gifted

Analytical skills of young gifted children

global_warming_pew

Pew survey on global warming

goog

Google stock data

gov_poll

Pew Research poll on government approval ratings

gpa_iq

Sample of students and their GPA and IQ

gpa_study_hours

gpa_study_hours

gpa

Survey of Duke students on GPA, studying, and more

gss2010

2010 General Social Survey

health_coverage

Health Coverage and Health Status

healthcare_law_survey

Pew Research Center poll on health care, including question variants

heart_transplant

Heart Transplant Data

helium

Helium football

helmet

Socioeconomic status and reduced-fee school lunches

hfi

Human Freedom Index

house

United States House of Representatives historical make-up

hsb2

High School and Beyond survey

husbands_wives

Great Britain: husband and wife pairs

immigration

Poll on illegal workers in the US

infmortrate

Infant Mortality Rates, 2012

ipo

Facebook, Google, and LinkedIn IPO filings

ipod

Length of songs on an iPod

kobe_basket

Kobe Bryant basketball performance

law_resume

Gender, Socioeconomic Class, and Interview Invites

leg_mari

Legalization of Marijuana Support in 2010 California Survey

lizard_habitat

Field data on lizards observed in their natural habitat

lizard_run

Lizard speeds

loans_full_schema

Loan data from Lending Club

loans_full_schema

Loan data from Lending Club

london_boroughs

London Borough Boundaries

london_murders

London Murders, 2006-2011

mail_me

Influence of a Good Mood on Helpfulness

major_survey

Survey of Duke students and the area of their major

malaria

Malaria Vaccine Trial

male_heights_fcid

Random sample of adult male heights

male_heights

Sample of 100 male heights

mammals

Sleep in Mammals

mammogram

Experiment with Mammogram Randomized

marathon

New York City Marathon Times (outdated)

mariokart

Wii Mario Kart auctions from Ebay

mcu_films

Marvel Cinematic Universe films

midterms_house

President's party performance and unemployment rate

migraine

Migraines and acupuncture

military

US Military Demographics

mlb_players_18

Batter Statistics for 2018 Major League Baseball (MLB) Season

mlb_teams

Major League Baseball Teams Data.

mlb

Salary data for Major League Baseball (2010)

mlbbat10

Major League Baseball Player Hitting Statistics for 2010

mtl

Medial temporal lobe (MTL) and other data for 26 participants

murders

Data for 20 metropolitan areas

nba_heights

NBA Player heights from 2008-9

nba_players_19

NBA Players for the 2018-2019 season

ncbirths

North Carolina births, 1000 cases

nuclear_survey

Nuclear Arms Reduction Survey

nyc_marathon

New York City Marathon Times

nycflights

Flights data

offshore_drilling

California poll on drilling off the California coast

opportunity_cost

Opportunity cost of purchases

orings

1986 Challenger disaster and O-rings

oscars

Oscar winners, 1929 to 2018

penelope

Guesses at the weight of Penelope (a cow)

penetrating_oil

What's the best way to loosen a rusty bolt?

penny_ages

Penny Ages

pew_energy_2018

Pew Survey on Energy Sources in 2018

photo_classify

Photo classifications: fashion or not

piracy

Piracy and PIPA/SOPA

playing_cards

Table of Playing Cards in 52-Card Deck

pm25_2011_durham

Air quality for Durham, NC

poker

Poker winnings during 50 sessions

possum

Possums in Australia and New Guinea

ppp_201503

US Poll on who it is better to raise taxes on

present

Birth counts

president

United States Presidental History

prison

Prison isolation experiment

prius_mpg

User reported fuel efficiency for 2017 Toyota Prius Prime

race_justice

Yahoo! News Race and Justice poll results

reddit_finance

Reddit Survey on Financial Independence.

resume

Which resume attributes drive job callbacks?

rosling_responses

Sample Responses to Two Public Health Questions

russian_influence_on_us_election_2016

Russians' Opinions on US Election Influence in 2016

salinity

Salinity in Bimini Lagoon, Bahamas

satgpa

SAT and GPA data

scotus_healthcare

Public Opinion with SCOTUS ruling on American Healthcare Act

seattlepets

Names of pets in Seattle

sex_discrimination

Bank manager recommendations based on sex

sinusitis

Sinusitis and antibiotic experiment

sleep_deprivation

Survey on sleep deprivation and transportation workers

smallpox

Smallpox vaccine results

smoking

UK Smoking Data

snowfall

Snowfall at Paradise, Mt. Rainier National Park

socialexp

Social experiment

solar

Energy Output From Two Solar Arrays in San Francisco

sowc_child_mortality

SOWC Child Mortality Data.

sowc_demographics

SOWC Demographics Data.

sowc_maternal_newborn

SOWC Maternal and Newborn Health Data.

sp500_1950_2018

Daily observations for the S\&P 500

sp500_seq

S&P 500 stock data

sp500

Financial information for 50 S&P 500 companies

speed_gender_height

Speed, gender, and height of 1325 students

ssd_speed

SSD read and write speeds

starbucks

Starbucks nutrition

stats_scores

Final exam scores for twenty students

stem_cell

Embryonic stem cells to treat heart attack (in sheep)

stent30

Stents for the treatment of stroke

stent30

Stents for the treatment of stroke

stocks_18

Monthly Returns for a few stocks

sulphinpyrazone

Treating heart attacks

supreme_court

Supreme Court approval rating

teacher

Teacher Salaries in St. Louis, Michigan

textbooks

Textbook data for UCLA Bookstore and Amazon

tips

Tip data

tourism

Turkey tourism

transplant

Transplant consultant success rate (fake data)

ucla_f18

UCLA courses in Fall 2018

ucla_textbooks_f18

Sample of UCLA course textbooks for Fall 2018

ukdemo

United Kingdom Demographic Data

unempl

Annual unemployment since 1890

unemploy_pres

President's party performance and unemployment rate

winery_cars

Time Between Gondola Cars at Sterling Winery

world_pop

World Population Data.

xom

Exxon Mobile stock data

yawn

Contagiousness of yawning

yrbss_samp

Sample of Youth Risk Behavior Surveillance System (YRBSS)

yrbss

Youth Risk Behavior Surveillance System (YRBSS)

Simulated datasets

These are simulated datasets, primarly used in the textbooks for illustrating a particular concept or showcasing features of a particular methodology.

association

Simulated data for association plots

cchousing

Community college housing (simulated data)

classdata

Simulated class data

family_college

Simulated sample of parent / teen college attendance

gradestv

Simulated data for analyzing the relationship between watching TV and grades

gsearch

Simulated Google search experiment

housing

Simulated data set on student housing

jury

Simulated juror data set

outliers

Simulated data sets for different types of outliers

res_demo_1

Simulated data for regression

res_demo_2

Simulated data for regression

sat_improve

Simulated data for SAT score improvement

simulated_dist

Simulated data sets, not necessarily drawn from a normal distribution.

simulated_normal

Simulated data sets, drawn from a normal distribution.

simulated_scatter

Simulated data for sample scatterplots

student_housing

Community college housing (simulated data, 2015)

student_sleep

Sleep for 110 students (simulated)

thanksgiving_spend

Thanksgiving spending, simulated based on Gallup poll.

toohey

Simulated polling data set

toy_anova

Simulated data set for ANOVA

Color palettes

These are the color palettes used in OpenIntro books.

COL

OpenIntro Statistics colors

IMSCOL

Introduction to Modern Statistics (IMS) Colors

openintro_colors

OpenIntro colors

openintro_palettes

OpenIntro palettes

Functions

These functions are used for creating visualisations and summary tables in the books as well as for keeping the OpenIntro datasets website up to date with the datasets in this package.

ArrowLines()

Create a Line That may have Arrows on the Ends

AxisInDollars()

Build Better Looking Axis Labels for US Dollars

AxisInPercent()

Build Better Looking Axis Labels for Percentages

BG()

Add background color to a plot

boxPlot()

Box plot

Braces()

Plot a Braces Symbol

buildAxis()

Axis function substitute

calc_streak()

Calculate hit streaks

CCP()

Plot a Cartesian Coordinate Plane

ChiSquareTail()

Plot upper tail in chi-square distribution

contTable()

Generate Contingency Tables for LaTeX

CT2DF()

Contingency Table to Data Frame

densityPlot()

Density plot

dlsegments()

Create a Double Line Segment Plot

dotPlot()

Dot plot

dotPlotStack()

Add a Stacked Dot Plot to an Existing Plot

edaPlot()

Exploratory data analysis plot

fadeColor()

Fade colors

histPlot()

Histogram or hollow histogram

lab_report()

lab_report

linResPlot()

Create simple regression plot with residual plot

lmPlot()

Linear regression plot with residual plot

loop()

Output a message while inside a loop

lsegments()

Create a Line Segment Plot

makeTube()

Regression tube

MosaicPlot()

Custom Mosaic Plot

myPDF()

Custom PDF function

myPDF()

Custom PDF function

normTail()

Normal distribution tails

openintro_cols()

Function to extract OpenIntro IMS colors as hex codes

openintro_pal()

Return function to interpolate an OpenIntro IMS color palette

PlotWLine()

Plot data and add a regression line

qqnormsim()

Generate simulated QQ plots

scale_color_openintro()

Color scale constructor for OpenIntro IMS colors

scale_fill_openintro()

Fill scale constructor for OpenIntro IMS colors

treeDiag()

Construct tree diagrams

write_pkg_data()

Create a CSV variant of .rda files