Function reference
Datasets
These are real datasets collected as part of surveys, polls, other observational studies, or experiments.
-
absenteeism
- Absenteeism from school in New South Wales
-
acs12
- American Community Survey, 2012
-
age_at_mar
- Age at first marriage of 5,534 US women.
-
ames
- Housing prices in Ames, Iowa
-
antibiotics
- Pre-existing conditions in 92 children
-
arbuthnot
- Male and female births in London
-
ask
- How important is it to ask pointed questions?
-
assortative_mating
- Eye color of couples
-
avandia
- Cardiovascular problems for two types of Diabetes medicines
-
babies_crawl
- Crawling age
-
babies
- The Child Health and Development Studies
-
bac
- Beer and blood alcohol content
-
bdims
- Body measurements of 507 physically active individuals.
-
biontech_adolescents
- Efficacy of Pfizer-BioNTech COVID-19 vaccine on adolescents
-
birds
- Aircraft-Wildlife Collisions
-
births
- North Carolina births, 100 cases
-
births14
- US births
-
blizzard_salary
- Blizzard Employee Voluntary Salary Info.
-
burger
- Burger preferences
-
cancer_in_dogs
- Cancer in dogs
-
cards
- Deck of cards
-
cars04
- cars04
-
cars93
- cars93
-
census
- Random sample of 2000 U.S. Census Data
-
cherry
- Summary information for 31 cherry trees
-
children_gender_stereo
- Gender Stereotypes in 5-7 year old Children
-
china
- Child care hours
-
cia_factbook
- CIA Factbook Details on Countries
-
cle_sac
- Cleveland and Sacramento
-
climate70
- Temperature Summary Data, Geography Limited
-
climber_drugs
- Climber Drugs Data.
-
coast_starlight
- Coast Starlight Amtrak train
-
comics
- comics
-
country_iso
- Country ISO information
-
cpr
- CPR dataset
-
cpu
- CPU's Released between 2010 and 2020.
-
daycare_fines
- Daycare fines
-
diabetes2
- Type 2 Diabetes Clinical Trial for Patients 10-17 Years Old
-
dream
- Survey on views of the DREAM Act
-
drone_blades
- Quadcopter Drone Blades
-
drug_use
- Drug use of students and parents
-
duke_forest
- Sale prices of houses in Duke Forest, Durham, NC
-
earthquakes
- Earthquakes
-
ebola_survey
- Survey on Ebola quarantine
-
elmhurst
- Elmhurst College gift aid
-
email
- Data frame representing information about a collection of emails
-
email50
- Sample of 50 emails
-
env_regulation
- American Adults on Regulation and Renewable Energy
-
epa2012
- Vehicle info from the EPA for 2012
-
epa2021
- Vehicle info from the EPA for 2021
-
esi
- Environmental Sustainability Index 2005
-
ethanol
- Ethanol Treatment for Tumors Experiment
-
evals
- Professor evaluations and beauty
-
exam_grades
- Exam and course grades for statistics students
-
exams
- Exam scores
-
exclusive_relationship
- Number of Exclusive Relationships
-
fact_opinion
- Can Americans categorize facts and opinions?
-
fastfood
- Nutrition in fast food
-
fcid
- Summary of male heights from USDA Food Commodity Intake Database
-
fheights
- Female college student heights, in inches
-
fish_age
- Young fish in the North Sea.
-
fish_oil_18
- Findings on n-3 Fatty Acid Supplement Health Benefits
-
flow_rates
- River flow data
-
friday
- Friday the 13th
-
full_body_scan
- Poll about use of full-body airport scanners
-
gdp_countries
- GDP Countries Data.
-
gear_company
- Fake data for a gear company example
-
gender_discrimination
- Bank manager recommendations based on gender
-
get_it_dunn_run
- Get it Dunn Run, Race Times
-
gifted
- Analytical skills of young gifted children
-
global_warming_pew
- Pew survey on global warming
-
goog
- Google stock data
-
gov_poll
- Pew Research poll on government approval ratings
-
gpa_iq
- Sample of students and their GPA and IQ
-
gpa_study_hours
- gpa_study_hours
-
gpa
- Survey of Duke students on GPA, studying, and more
-
gss_wordsum_class
- gss_wordsum_class
-
gss2010
- 2010 General Social Survey
-
health_coverage
- Health Coverage and Health Status
-
healthcare_law_survey
- Pew Research Center poll on health care, including question variants
-
heart_transplant
- Heart Transplant Data
-
helium
- Helium football
-
helmet
- Socioeconomic status and reduced-fee school lunches
-
hfi
- Human Freedom Index
-
house
- United States House of Representatives historical make-up
-
hsb2
- High School and Beyond survey
-
husbands_wives
- Great Britain: husband and wife pairs
-
immigration
- Poll on illegal workers in the US
-
infmortrate
- Infant Mortality Rates, 2012
-
iowa
- iowa
-
ipo
- Facebook, Google, and LinkedIn IPO filings
-
iran
- iran
-
kobe_basket
- Kobe Bryant basketball performance
-
labor_market_discrimination
- Are Emily and Greg More Employable Than Lakisha and Jamal?
-
LAhomes
- LAhomes
-
law_resume
- Gender, Socioeconomic Class, and Interview Invites
-
lecture_learning
- Lecture Delivery Method and Learning Outcomes
-
leg_mari
- Legalization of Marijuana Support in 2010 California Survey
-
lego_population
- Population of Lego Sets for Sale between Jan. 1, 2018 and Sept. 11, 2020.
-
lego_sample
- Sample of Lego Sets
-
life_exp
- life_exp
-
lizard_habitat
- Field data on lizards observed in their natural habitat
-
lizard_run
- Lizard speeds
-
loans_full_schema
- Loan data from Lending Club
-
london_boroughs
- London Borough Boundaries
-
london_murders
- London Murders, 2006-2011
-
mail_me
- Influence of a Good Mood on Helpfulness
-
major_survey
- Survey of Duke students and the area of their major
-
malaria
- Malaria Vaccine Trial
-
male_heights
- Sample of 100 male heights
-
mammals
- Sleep in Mammals
-
mammogram
- Experiment with Mammogram Randomized
-
manhattan
- manhattan
-
marathon
- New York City Marathon Times (outdated)
-
mariokart
- Wii Mario Kart auctions from Ebay
-
mcu_films
- Marvel Cinematic Universe films
-
midterms_house
- President's party performance and unemployment rate
-
migraine
- Migraines and acupuncture
-
military
- US Military Demographics
-
mlb_players_18
- Batter Statistics for 2018 Major League Baseball (MLB) Season
-
mlb_teams
- Major League Baseball Teams Data.
-
mlb
- Salary data for Major League Baseball (2010)
-
mlbbat10
- Major League Baseball Player Hitting Statistics for 2010
-
mn_police_use_of_force
- Minneapolis police use of force data.
-
movies
- movies
-
mtl
- Medial temporal lobe (MTL) and other data for 26 participants
-
murders
- Data for 20 metropolitan areas
-
nba_finals_teams
- NBA Finals Team Summary
-
nba_finals
- NBA Finals History
-
nba_heights
- NBA Player heights from 2008-9
-
nba_players_19
- NBA Players for the 2018-2019 season
-
ncbirths
- North Carolina births, 1000 cases
-
nuclear_survey
- Nuclear Arms Reduction Survey
-
nyc_marathon
- New York City Marathon Times
-
nyc
- nyc
-
nycflights
- Flights data
-
offshore_drilling
- California poll on drilling off the California coast
-
opportunity_cost
- Opportunity cost of purchases
-
orings
- 1986 Challenger disaster and O-rings
-
oscars
- Oscar winners, 1929 to 2018
-
paralympic_1500
- Race time for Olympic and Paralympic 1500m.
-
penelope
- Guesses at the weight of Penelope (a cow)
-
penetrating_oil
- What's the best way to loosen a rusty bolt?
-
penny_ages
- Penny Ages
-
pew_energy_2018
- Pew Survey on Energy Sources in 2018
-
photo_classify
- Photo classifications: fashion or not
-
piracy
- Piracy and PIPA/SOPA
-
playing_cards
- Table of Playing Cards in 52-Card Deck
-
pm25_2011_durham
- Air quality for Durham, NC
-
pm25_2022_durham
- Air quality for Durham, NC
-
poker
- Poker winnings during 50 sessions
-
possum
- Possums in Australia and New Guinea
-
ppp_201503
- US Poll on who it is better to raise taxes on
-
present
- Birth counts
-
president
- United States Presidental History
-
prison
- Prison isolation experiment
-
prius_mpg
- User reported fuel efficiency for 2017 Toyota Prius Prime
-
race_justice
- Yahoo! News Race and Justice poll results
-
reddit_finance
- Reddit Survey on Financial Independence.
-
resume
- Which resume attributes drive job callbacks?
-
rosling_responses
- Sample Responses to Two Public Health Questions
-
russian_influence_on_us_election_2016
- Russians' Opinions on US Election Influence in 2016
-
sa_gdp_elec
- Sustainability and Economic Indicators for South Africa.
-
salinity
- Salinity in Bimini Lagoon, Bahamas
-
satgpa
- SAT and GPA data
-
scotus_healthcare
- Public Opinion with SCOTUS ruling on American Healthcare Act
-
seattlepets
- Names of pets in Seattle
-
sex_discrimination
- Bank manager recommendations based on sex
-
simpsons_paradox_covid
- Simpson's Paradox: Covid
-
sinusitis
- Sinusitis and antibiotic experiment
-
sleep_deprivation
- Survey on sleep deprivation and transportation workers
-
smallpox
- Smallpox vaccine results
-
smoking
- UK Smoking Data
-
snowfall
- Snowfall at Paradise, Mt. Rainier National Park
-
socialexp
- Social experiment
-
soda
- soda
-
solar
- Energy Output From Two Solar Arrays in San Francisco
-
sowc_child_mortality
- SOWC Child Mortality Data.
-
sowc_demographics
- SOWC Demographics Data.
-
sowc_maternal_newborn
- SOWC Maternal and Newborn Health Data.
-
sp500_1950_2018
- Daily observations for the S&P 500
-
sp500_seq
- S&P 500 stock data
-
sp500
- Financial information for 50 S&P 500 companies
-
speed_gender_height
- Speed, gender, and height of 1325 students
-
ssd_speed
- SSD read and write speeds
-
starbucks
- Starbucks nutrition
-
stats_scores
- Final exam scores for twenty students
-
stem_cell
- Embryonic stem cells to treat heart attack (in sheep)
-
stent30
- Stents for the treatment of stroke
-
stocks_18
- Monthly Returns for a few stocks
-
sulphinpyrazone
- Treating heart attacks
-
supreme_court
- Supreme Court approval rating
-
teacher
- Teacher Salaries in St. Louis, Michigan
-
textbooks
- Textbook data for UCLA Bookstore and Amazon
-
tourism
- Turkey tourism
-
transplant
- Transplant consultant success rate (fake data)
-
twins
- twins
-
ucb_admit
- ucb_admit
-
ucla_f18
- UCLA courses in Fall 2018
-
ucla_textbooks_f18
- Sample of UCLA course textbooks for Fall 2018
-
ukdemo
- United Kingdom Demographic Data
-
unempl
- Annual unemployment since 1890
-
unemploy_pres
- President's party performance and unemployment rate
-
us_temperature
- US temperatures in 1950 and 2022
-
winery_cars
- Time Between Gondola Cars at Sterling Winery
-
world_pop
- World Population Data.
-
xom
- Exxon Mobile stock data
-
yawn
- Contagiousness of yawning
-
yrbss_samp
- Sample of Youth Risk Behavior Surveillance System (YRBSS)
-
yrbss
- Youth Risk Behavior Surveillance System (YRBSS)
Simulated datasets
These are simulated datasets, primarly used in the textbooks for illustrating a particular concept or showcasing features of a particular methodology.
-
ami_occurrences
- Acute Myocardial Infarction (Heart Attack) Events
-
association
- Simulated data for association plots
-
ball_bearing
- Lifespan of ball bearings
-
books
- Sample of books on a shelf
-
cchousing
- Community college housing (simulated data)
-
classdata
- Simulated class data
-
corr_match
- Sample datasets for correlation problems
-
credits
- College credits.
-
family_college
- Simulated sample of parent / teen college attendance
-
gradestv
- Simulated data for analyzing the relationship between watching TV and grades
-
gsearch
- Simulated Google search experiment
-
housing
- Simulated dataset on student housing
-
ipod
- Length of songs on an iPod
-
jury
- Simulated juror dataset
-
male_heights_fcid
- Random sample of adult male heights
-
outliers
- Simulated datasets for different types of outliers
-
res_demo_1
- Simulated data for regression
-
res_demo_2
- Simulated data for regression
-
sat_improve
- Simulated data for SAT score improvement
-
simulated_dist
- Simulated datasets, not necessarily drawn from a normal distribution.
-
simulated_normal
- Simulated datasets, drawn from a normal distribution.
-
simulated_scatter
- Simulated data for sample scatterplots
-
student_housing
- Community college housing (simulated data, 2015)
-
student_sleep
- Sleep for 110 students (simulated)
-
thanksgiving_spend
- Thanksgiving spending, simulated based on Gallup poll.
-
tips
- Tip data
-
toohey
- Simulated polling dataset
-
toy_anova
- Simulated dataset for ANOVA
-
COL
- OpenIntro Statistics colors
-
IMSCOL
- Introduction to Modern Statistics (IMS) Colors
-
openintro_colors
- OpenIntro colors
-
openintro_palettes
- OpenIntro palettes
Functions
These functions are used for creating visualisations and summary tables in the books as well as for keeping the OpenIntro datasets website up to date with the datasets in this package.
-
ArrowLines()
- Create a Line That may have Arrows on the Ends
-
AxisInDollars()
- Build Better Looking Axis Labels for US Dollars
-
AxisInPercent()
- Build Better Looking Axis Labels for Percentages
-
BG()
- Add background color to a plot
-
boxPlot()
- Box plot
-
Braces()
- Plot a Braces Symbol
-
buildAxis()
- Axis function substitute
-
calc_streak()
- Calculate hit streaks
-
CCP()
- Plot a Cartesian Coordinate Plane
-
ChiSquareTail()
- Plot upper tail in chi-square distribution
-
contTable()
- Generate Contingency Tables for LaTeX
-
CT2DF()
- Contingency Table to Data Frame
-
densityPlot()
- Density plot
-
dlsegments()
- Create a Double Line Segment Plot
-
dotPlot()
- Dot plot
-
dotPlotStack()
- Add a Stacked Dot Plot to an Existing Plot
-
edaPlot()
- Exploratory data analysis plot
-
fadeColor()
- Fade colors
-
histPlot()
- Histogram or hollow histogram
-
lab_report()
- lab_report
-
linResPlot()
- Create simple regression plot with residual plot
-
lmPlot()
- Linear regression plot with residual plot
-
loop()
- Output a message while inside a loop
-
lsegments()
- Create a Line Segment Plot
-
makeTube()
- Regression tube
-
MosaicPlot()
- Custom Mosaic Plot
-
myPDF()
- Custom PDF function
-
normTail()
- Normal distribution tails
-
openintro_cols()
- Function to extract OpenIntro IMS colors as hex codes
-
openintro_pal()
- Return function to interpolate an OpenIntro IMS color palette
-
PlotWLine()
- Plot data and add a regression line
-
qqnormsim()
- Generate simulated QQ plots
-
scale_color_openintro()
- Color scale constructor for OpenIntro IMS colors
-
scale_fill_openintro()
- Fill scale constructor for OpenIntro IMS colors
-
treeDiag()
- Construct tree diagrams
-
write_pkg_data()
- Create a CSV variant of .rda files