Data about Lego Sets for sale. Based on JSDSE article by Anna Peterson and Laura Ziegler Data from their article was scrapped from multiple sources including brickset.com
Format
A data frame with 75 rows and 15 variables.
- item_number
Set Item number
- set_name
Name of the set.
- theme
Set theme: Duplo, City or Friends.
- pieces
Number of pieces in the set.
- price
Recommended retail price from LEGO.
- amazon_price
Price of the set at Amazon.
- year
Year that it was produced.
- ages
LEGO's recommended ages of children for the set
- pages
Pages in the instruction booklet.
- minifigures
Number of LEGO people in the data, if unknown "NA" was recorded.
- packaging
Type of packaging: bag, box, etc.
- weight
Weight of the set of LEGOS in pounds and kilograms.
- unique_pieces
Number of pieces classified as unique in the instruction manual.
- size
Size of the lego pieces: Large if safe for small children and Small for older children.
Source
Peterson, A. D., & Ziegler, L. (2021). Building a multiple linear regression model with LEGO brick data. Journal of Statistics and Data Science Education, 29(3),1-7. doi:10.1080/26939169.2021.1946450
BrickInstructions.com. (n.d.). Retrieved February 2, 2021 from
Brickset. (n.d.). BRICKSET: Your LEGO® set guide. Retrieved February 2, 2021 from
Examples
library(ggplot2)
library(dplyr)
lego_sample |>
filter(theme == "Friends" | theme == "City") |>
ggplot(aes(x = pieces, y = amazon_price)) +
geom_point(alpha = 0.3) +
labs(
x = "Pieces in the Set",
y = "Amazon Price",
title = "Amazon Price vs Number of Pieces in Lego Sets",
subtitle = "Friends and City Themes"
)