Skip to contents

Datasets

These are real datasets collected as part of surveys, polls, other observational studies, or experiments.

absenteeism
Absenteeism from school in New South Wales
acs12
American Community Survey, 2012
age_at_mar
Age at first marriage of 5,534 US women.
ames
Housing prices in Ames, Iowa
antibiotics
Pre-existing conditions in 92 children
arbuthnot
Male and female births in London
ask
How important is it to ask pointed questions?
assortative_mating
Eye color of couples
avandia
Cardiovascular problems for two types of Diabetes medicines
babies_crawl
Crawling age
babies
The Child Health and Development Studies
bac
Beer and blood alcohol content
bdims
Body measurements of 507 physically active individuals.
biontech_adolescents
Efficacy of Pfizer-BioNTech COVID-19 vaccine on adolescents
birds
Aircraft-Wildlife Collisions
births
North Carolina births, 100 cases
births14
US births
blizzard_salary
Blizzard Employee Voluntary Salary Info.
burger
Burger preferences
cancer_in_dogs
Cancer in dogs
cards
Deck of cards
cars04
cars04
cars93
cars93
census
Random sample of 2000 U.S. Census Data
cherry
Summary information for 31 cherry trees
children_gender_stereo
Gender Stereotypes in 5-7 year old Children
china
Child care hours
cia_factbook
CIA Factbook Details on Countries
cle_sac
Cleveland and Sacramento
climate70
Temperature Summary Data, Geography Limited
climber_drugs
Climber Drugs Data.
coast_starlight
Coast Starlight Amtrak train
comics
comics
country_iso
Country ISO information
cpr
CPR dataset
cpu
CPU's Released between 2010 and 2020.
daycare_fines
Daycare fines
diabetes2
Type 2 Diabetes Clinical Trial for Patients 10-17 Years Old
dream
Survey on views of the DREAM Act
drone_blades
Quadcopter Drone Blades
drug_use
Drug use of students and parents
duke_forest
Sale prices of houses in Duke Forest, Durham, NC
earthquakes
Earthquakes
ebola_survey
Survey on Ebola quarantine
elmhurst
Elmhurst College gift aid
email
Data frame representing information about a collection of emails
email50
Sample of 50 emails
env_regulation
American Adults on Regulation and Renewable Energy
epa2012
Vehicle info from the EPA for 2012
epa2021
Vehicle info from the EPA for 2021
esi
Environmental Sustainability Index 2005
ethanol
Ethanol Treatment for Tumors Experiment
evals
Professor evaluations and beauty
exam_grades
Exam and course grades for statistics students
exams
Exam scores
exclusive_relationship
Number of Exclusive Relationships
fact_opinion
Can Americans categorize facts and opinions?
fastfood
Nutrition in fast food
fcid
Summary of male heights from USDA Food Commodity Intake Database
fheights
Female college student heights, in inches
fish_age
Young fish in the North Sea.
fish_oil_18
Findings on n-3 Fatty Acid Supplement Health Benefits
flow_rates
River flow data
friday
Friday the 13th
full_body_scan
Poll about use of full-body airport scanners
gdp_countries
GDP Countries Data.
gear_company
Fake data for a gear company example
gender_discrimination
Bank manager recommendations based on gender
get_it_dunn_run
Get it Dunn Run, Race Times
gifted
Analytical skills of young gifted children
global_warming_pew
Pew survey on global warming
goog
Google stock data
gov_poll
Pew Research poll on government approval ratings
gpa_iq
Sample of students and their GPA and IQ
gpa_study_hours
gpa_study_hours
gpa
Survey of Duke students on GPA, studying, and more
gss_wordsum_class
gss_wordsum_class
gss2010
2010 General Social Survey
health_coverage
Health Coverage and Health Status
healthcare_law_survey
Pew Research Center poll on health care, including question variants
heart_transplant
Heart Transplant Data
helium
Helium football
helmet
Socioeconomic status and reduced-fee school lunches
hfi
Human Freedom Index
house
United States House of Representatives historical make-up
hsb2
High School and Beyond survey
husbands_wives
Great Britain: husband and wife pairs
immigration
Poll on illegal workers in the US
infmortrate
Infant Mortality Rates, 2012
iowa
iowa
ipo
Facebook, Google, and LinkedIn IPO filings
iran
iran
kobe_basket
Kobe Bryant basketball performance
labor_market_discrimination
Are Emily and Greg More Employable Than Lakisha and Jamal?
LAhomes
LAhomes
law_resume
Gender, Socioeconomic Class, and Interview Invites
lecture_learning
Lecture Delivery Method and Learning Outcomes
leg_mari
Legalization of Marijuana Support in 2010 California Survey
lego_population
Population of Lego Sets for Sale between Jan. 1, 2018 and Sept. 11, 2020.
lego_sample
Sample of Lego Sets
life_exp
life_exp
lizard_habitat
Field data on lizards observed in their natural habitat
lizard_run
Lizard speeds
loans_full_schema
Loan data from Lending Club
london_boroughs
London Borough Boundaries
london_murders
London Murders, 2006-2011
mail_me
Influence of a Good Mood on Helpfulness
major_survey
Survey of Duke students and the area of their major
malaria
Malaria Vaccine Trial
male_heights
Sample of 100 male heights
mammals
Sleep in Mammals
mammogram
Experiment with Mammogram Randomized
manhattan
manhattan
marathon
New York City Marathon Times (outdated)
mariokart
Wii Mario Kart auctions from Ebay
mcu_films
Marvel Cinematic Universe films
midterms_house
President's party performance and unemployment rate
migraine
Migraines and acupuncture
military
US Military Demographics
mlb_players_18
Batter Statistics for 2018 Major League Baseball (MLB) Season
mlb_teams
Major League Baseball Teams Data.
mlb
Salary data for Major League Baseball (2010)
mlbbat10
Major League Baseball Player Hitting Statistics for 2010
mn_police_use_of_force
Minneapolis police use of force data.
movies
movies
mtl
Medial temporal lobe (MTL) and other data for 26 participants
murders
Data for 20 metropolitan areas
nba_finals_teams
NBA Finals Team Summary
nba_finals
NBA Finals History
nba_heights
NBA Player heights from 2008-9
nba_players_19
NBA Players for the 2018-2019 season
ncbirths
North Carolina births, 1000 cases
nuclear_survey
Nuclear Arms Reduction Survey
nyc_marathon
New York City Marathon Times
nyc
nyc
nycflights
Flights data
offshore_drilling
California poll on drilling off the California coast
opportunity_cost
Opportunity cost of purchases
orings
1986 Challenger disaster and O-rings
oscars
Oscar winners, 1929 to 2018
paralympic_1500
Race time for Olympic and Paralympic 1500m.
penelope
Guesses at the weight of Penelope (a cow)
penetrating_oil
What's the best way to loosen a rusty bolt?
penny_ages
Penny Ages
pew_energy_2018
Pew Survey on Energy Sources in 2018
photo_classify
Photo classifications: fashion or not
piracy
Piracy and PIPA/SOPA
playing_cards
Table of Playing Cards in 52-Card Deck
pm25_2011_durham
Air quality for Durham, NC
pm25_2022_durham
Air quality for Durham, NC
poker
Poker winnings during 50 sessions
possum
Possums in Australia and New Guinea
ppp_201503
US Poll on who it is better to raise taxes on
present
Birth counts
president
United States Presidental History
prison
Prison isolation experiment
prius_mpg
User reported fuel efficiency for 2017 Toyota Prius Prime
race_justice
Yahoo! News Race and Justice poll results
reddit_finance
Reddit Survey on Financial Independence.
resume
Which resume attributes drive job callbacks?
rosling_responses
Sample Responses to Two Public Health Questions
russian_influence_on_us_election_2016
Russians' Opinions on US Election Influence in 2016
sa_gdp_elec
Sustainability and Economic Indicators for South Africa.
salinity
Salinity in Bimini Lagoon, Bahamas
satgpa
SAT and GPA data
scotus_healthcare
Public Opinion with SCOTUS ruling on American Healthcare Act
seattlepets
Names of pets in Seattle
sex_discrimination
Bank manager recommendations based on sex
simpsons_paradox_covid
Simpson's Paradox: Covid
sinusitis
Sinusitis and antibiotic experiment
sleep_deprivation
Survey on sleep deprivation and transportation workers
smallpox
Smallpox vaccine results
smoking
UK Smoking Data
snowfall
Snowfall at Paradise, Mt. Rainier National Park
socialexp
Social experiment
soda
soda
solar
Energy Output From Two Solar Arrays in San Francisco
sowc_child_mortality
SOWC Child Mortality Data.
sowc_demographics
SOWC Demographics Data.
sowc_maternal_newborn
SOWC Maternal and Newborn Health Data.
sp500_1950_2018
Daily observations for the S&P 500
sp500_seq
S&P 500 stock data
sp500
Financial information for 50 S&P 500 companies
speed_gender_height
Speed, gender, and height of 1325 students
ssd_speed
SSD read and write speeds
starbucks
Starbucks nutrition
stats_scores
Final exam scores for twenty students
stem_cell
Embryonic stem cells to treat heart attack (in sheep)
stent30
Stents for the treatment of stroke
stocks_18
Monthly Returns for a few stocks
sulphinpyrazone
Treating heart attacks
supreme_court
Supreme Court approval rating
teacher
Teacher Salaries in St. Louis, Michigan
textbooks
Textbook data for UCLA Bookstore and Amazon
tourism
Turkey tourism
transplant
Transplant consultant success rate (fake data)
twins
twins
ucb_admit
ucb_admit
ucla_f18
UCLA courses in Fall 2018
ucla_textbooks_f18
Sample of UCLA course textbooks for Fall 2018
ukdemo
United Kingdom Demographic Data
unempl
Annual unemployment since 1890
unemploy_pres
President's party performance and unemployment rate
us_temperature
US temperatures in 1950 and 2022
winery_cars
Time Between Gondola Cars at Sterling Winery
world_pop
World Population Data.
xom
Exxon Mobile stock data
yawn
Contagiousness of yawning
yrbss_samp
Sample of Youth Risk Behavior Surveillance System (YRBSS)
yrbss
Youth Risk Behavior Surveillance System (YRBSS)

Simulated datasets

These are simulated datasets, primarly used in the textbooks for illustrating a particular concept or showcasing features of a particular methodology.

ami_occurrences
Acute Myocardial Infarction (Heart Attack) Events
association
Simulated data for association plots
ball_bearing
Lifespan of ball bearings
books
Sample of books on a shelf
cchousing
Community college housing (simulated data)
classdata
Simulated class data
corr_match
Sample datasets for correlation problems
credits
College credits.
family_college
Simulated sample of parent / teen college attendance
gradestv
Simulated data for analyzing the relationship between watching TV and grades
gsearch
Simulated Google search experiment
housing
Simulated dataset on student housing
ipod
Length of songs on an iPod
jury
Simulated juror dataset
male_heights_fcid
Random sample of adult male heights
outliers
Simulated datasets for different types of outliers
res_demo_1
Simulated data for regression
res_demo_2
Simulated data for regression
sat_improve
Simulated data for SAT score improvement
simulated_dist
Simulated datasets, not necessarily drawn from a normal distribution.
simulated_normal
Simulated datasets, drawn from a normal distribution.
simulated_scatter
Simulated data for sample scatterplots
student_housing
Community college housing (simulated data, 2015)
student_sleep
Sleep for 110 students (simulated)
thanksgiving_spend
Thanksgiving spending, simulated based on Gallup poll.
tips
Tip data
toohey
Simulated polling dataset
toy_anova
Simulated dataset for ANOVA

Color palettes

These are the color palettes used in OpenIntro books.

COL
OpenIntro Statistics colors
IMSCOL
Introduction to Modern Statistics (IMS) Colors
openintro_colors
OpenIntro colors
openintro_palettes
OpenIntro palettes

Functions

These functions are used for creating visualisations and summary tables in the books as well as for keeping the OpenIntro datasets website up to date with the datasets in this package.

ArrowLines()
Create a Line That may have Arrows on the Ends
AxisInDollars()
Build Better Looking Axis Labels for US Dollars
AxisInPercent()
Build Better Looking Axis Labels for Percentages
BG()
Add background color to a plot
boxPlot()
Box plot
Braces()
Plot a Braces Symbol
buildAxis()
Axis function substitute
calc_streak()
Calculate hit streaks
CCP()
Plot a Cartesian Coordinate Plane
ChiSquareTail()
Plot upper tail in chi-square distribution
contTable()
Generate Contingency Tables for LaTeX
CT2DF()
Contingency Table to Data Frame
densityPlot()
Density plot
dlsegments()
Create a Double Line Segment Plot
dotPlot()
Dot plot
dotPlotStack()
Add a Stacked Dot Plot to an Existing Plot
edaPlot()
Exploratory data analysis plot
fadeColor()
Fade colors
histPlot()
Histogram or hollow histogram
lab_report()
lab_report
linResPlot()
Create simple regression plot with residual plot
lmPlot()
Linear regression plot with residual plot
loop()
Output a message while inside a loop
lsegments()
Create a Line Segment Plot
makeTube()
Regression tube
MosaicPlot()
Custom Mosaic Plot
myPDF()
Custom PDF function
normTail()
Normal distribution tails
openintro_cols()
Function to extract OpenIntro IMS colors as hex codes
openintro_pal()
Return function to interpolate an OpenIntro IMS color palette
PlotWLine()
Plot data and add a regression line
qqnormsim()
Generate simulated QQ plots
scale_color_openintro()
Color scale constructor for OpenIntro IMS colors
scale_fill_openintro()
Fill scale constructor for OpenIntro IMS colors
treeDiag()
Construct tree diagrams
write_pkg_data()
Create a CSV variant of .rda files