Every two years, the Centers for Disease Control and Prevention conduct the Youth Risk Behavior Surveillance System (YRBSS) survey, where it takes data from high schoolers (9th through 12th grade), to analyze health patterns. You will work with a selected group of variables from a random sample of observations during one of the years the YRBSS was conducted.
Load the yrbss data set, which can be found here into your workspace.
There are observations on 13 different variables, some categorical and some numerical. The meaning of each variable can be found by visiting this page.
You will first start with analyzing the weight of the participants in kilograms: weight.
weight variable. How many observations are we missing weights from?Next, consider the possible relationship between a high schooler’s weight and their physical activity. Plotting the data is a useful first step because it helps us quickly visualize trends, identify strong associations, and develop research questions.
First, let’s create a new variable physical_3plus, which will be coded as either “yes” if the student is physically active for at least 3 days a week (physically_active_7d>2), and “no” if not. (Make sure the variable is set to be nominal.)
weight, for each level of physical_3plus. This is done using Descriptives, and placing weight in the “Variables” box and physical_3plus in the “Split by” box. Also make sure to check the box next to “Violin”. Is there a relationship between these two variables? What did you expect and why?Box plots show how the medians of the two distributions compare, but we can also compare the means of the distributions. In a descriptive statistics table, determine the mean value for weight for each group in the physical_3plus variable.
There is an observed difference, but is this difference large enough to deem it “statistically significant”? In order to answer this question we will conduct a hypothesis test.
Are all conditions necessary for inference satisfied? Comment on each. You can determine the group sizes in the descriptive statistics table.
Write the hypotheses for testing if the average weights are different for those who exercise at least three times a week, and those who don’t.
Now you’ll use jamovi to conduct a 2-sample \(t\)-test. Under the Analyes menu click on the icon for “T-Tests”. Then, click on the option for “Independent Samples T-Test”. Select weight as the Dependent Variable and physical_3plus as the Grouping Variable. This will create a table displaying the results of the 2-sample \(t\)-test on the right.
What is the value of the \(T\) test statistic?
What is the p-value?
Next we can add a confidence interval to our table. Check the “Mean difference” box which appears under Additional Statistics., and then click the box next to “Confidence interval”. Leave the value for the confidence level at 95%. You will now see the values for the confidence interval in your table.
Calculate a 95% confidence interval for the average height in meters (height) and interpret it in context. For a single variable, this can be done using the Descriptives analysis.
Calculate a new confidence interval for the same parameter at the 90% confidence level. Comment on the width of this interval versus the one obtained in the previous exercise.
Conduct a hypothesis test evaluating whether the average height is different for those who exercise at least three times a week and those who don’t.
Now, a non-inference task: Determine the number of different options there are in the dataset for the hours_tv_per_school_day there are.
Come up with a research question evaluating the relationship between height or weight and sleep. Formulate the question in a way that it can be answered using a hypothesis test and/or a confidence interval. Report the statistical results, and also provide an explanation in plain language. Be sure to check all assumptions, state your \(\alpha\) level, and conclude in context.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.