The goal of this lab is to introduce you to Rguroo, which you’ll use throughout the course to learn the statistical concepts discussed in the course and analyze real data and come to informed conclusions. To use Rguroo, you need to register for an account, as we explain below. There is no need to download or install any software. All you need is a browser and the Internet.
Today we begin with the fundamental building blocks of Rguroo: the interface, reading in data, and basic functions. Functions in Rguroo consist of dialog boxes where you input information, usually by selecting checkboxes or typing in values in text boxes, to perform a specific task.
To start, you will need to create a Rguroo account. Go to Rguroo (https://rguroo.com/) and select Register. The link Registration Instructions provides details for registering your account. To begin, click on the Student button and fill out the needed information. There is an email verification step where you enter your email address. Make sure to type your email address correctly since an email will be sent to the address that you type. If you don’t see the email in your Inbox within a few minutes, check your Spam folder.
IMPORTANT! It is very easy to accidentally include a leading or trailing space when copying/pasting the verification code. This is the most common reason why Rguroo declines verification codes!
Once Rguroo has verified your code, proceed to the registration screen. You will need to type your first and last name and your institution’s ZIP (postal) code. Select your Institution and its State. If you have an Access Key enter it in the space provided. Otherwise, leave the Access Key field blank. Read and accept the terms of service, then click Submit to get to the Billing Details page. If you filled out the Access Key portion, you would not see a billing page.
Some colleges and universities have site license deals with Rguroo that make student accounts available at reduced or no cost. If your institution has such a license, and there is a discounted amount to pay, the discount will be automatically reflected in the billing process that you would go through. If there is a non-zero balance that you have to pay, go through the billing process and click Place Order.
Upon completing the registration, you will receive an email that contains a temporary password. Follow the instructions in the email to log in to your Rguroo account.
Once you log in to your Rguroo account, Rguroo will open with its dataset editor, which you can use to input data. If you don’t need to use the dataset editor, you can simply close the tab. To learn more about the dataset editor, refer to Rguroo User’s guide.
The sidebar on the left contains five toolboxes: Data, Plots, Analytics, Probability-Simulation, and Applets. You can click on any of these toolboxes to view the Rguroo objects you have saved to that toolbox. You can also access a specific toolbox via the keyboard by pressing Alt + the second letter of the toolbox name (Alt + a for Data, Alt + l for Plots, Alt + n for Analytics, Alt + r for Probability-Simulation, and Alt + p for Applets). There is also a Quick Access Tools button on the top right that you can use to access a few frequently used functions, such as Import Dataset, Dataset Repository, a Scientific Calculator, and a list of shortcut keys.
To get started, let’s take a peek at the data. To access the data, go to the Data toolbox, select Data Import, and click on Dataset Repository, as shown in the figure below. In the dialog box that opens, type open in the upper search box and you should see the option to select the OpenIntro
repository. Once selected, the lower panel of the dialog box will show all the datasets in the repository. Click on the Dataset
column in the lower panel to sort the datasets alphabetically, then find the arbuthnot dataset, select it, and click on the Import
button. You should now see an Rguroo data file called arbuthnot in the Data toolbox on the sidebar. This dataset is now available for use in Rguroo functions.
As you work with Rguroo, you will create a series of objects. Objects include datasets, plots, output reports, etc., which can be saved under the Data, Plots, Analytics, and Probability-Simulation toolboxes in Rguroo. Sometimes, you import objects as we have done here, but you often create them as the byproduct of computation or some analysis you have performed.
The more you add Rguroo objects to your account, the more difficult it will become to immediately find the object you want. Thus, you can type text in the “Search” box to filter the objects in each section. You can also take advantage of the folder system to organize your objects.
The Arbuthnot dataset refers to the work of Dr. John Arbuthnot, an 18th-century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the baptism records for children born in London every year from 1629 to 1710.
To view the arbuthnot dataset that you imported, select the dataset’s name in the Data toolbox (in the sidebar) and press Enter to view it. Alternatively, double-click the dataset name or right-click and select the Edit
option. The dataset opens in Rguroo’s Dataset Editor, as shown in the figure below. The names of variables year, boys, and girls are shown on the header. The leftmost, unlabeled column with a gray background shows row numbers. Each row in the dataset represents a different year: the first element of each row is the year, and the second and third elements are the numbers of boys and girls baptized that year, respectively. Use the scroll bar on the right side of the window to navigate up and down in the dataset.
Note that the row numbers in the first column are not part of Arbuthnot’s dataset. You can think of them as the index that you see on the left side of a spreadsheet. In fact, the comparison to a spreadsheet will generally be helpful. Rguroo has stored Arbuthnot’s data in a kind of spreadsheet or table called a data frame. At the bottom of the dataset editor, you can see that there are 82 rows in the data frame, so there are 82 observations in the dataset. Also, to the right of the scroll bar, in the Columns
tab, Rguroo lists the variables in the dataset. You can see that there are 3 variables in this dataset, and the variable names are year, boys, and girls.
To include or remove specific variables (columns), you can simply check or uncheck the corresponding checkbox for each variable. These checkboxes are located in the Columns
tab to the right of the scroll bar in the dataset editor. By default, all variables are shown. When you uncheck a checkbox, you remove the associated variable from the display. Conversely, checking the checkbox will include the variable again. In the provided figure below, we have unchecked the checkbox for the “girls” variable. As a result, only the variables “year” and “boys” remain visible.
You might have observed an up-arrow symbol on the “boys” column in the figure below. This arrow indicates that the data within the “boys” column is sorted in ascending order. To sort the data within a column, simply click on the column header. The first time you click the header, the data will be sorted in ascending (smallest to largest, or A to Z) order. If you click the header again, the data will be sorted in descending (largest to smallest, or Z to A) order. Lastly, clicking the header for the third time will revert the data back to its original order.
You can use the Quick Search
textbox on the top to search for words in your dataset. Simply type a word or phrase in the textbox, and the occurrences of that word within the dataset will be highlighted. In addition, the Quick Filter
feature allows you to see a subset of the data containing a specific word. Just type the word you’re looking for in the Quick Filter
textbox, immediately to the right of the Quick Search
textbox, and the dataset will be filtered to show only the relevant rows that include that word.
Columns
tab on the right-hand side of the interface and uncheck the checkboxes for “year” and “boys” variables. Make sure to keep the checkbox for the “girls” variable checked. This will result in displaying only the column representing the counts of girls baptized. Capture a screenshot of the Dataset Editor showing at least the first ten cases for the “girls.” Copy-paste your screenshot below.Rguroo has some powerful modules for making graphics. Let’s start by creating a simple plot of the number of girls baptized per year. First, we need to open the Plots toolbox and click Create Plot. The plot we want to make is a scatterplot, so let’s select that, as shown in the figure below. This opens up the Scatterplot dialog box. When we open up an Rguroo dialog, usually the first thing we need to do is select the Dataset
we will be using. Let’s select arbuthnot as our Dataset
. Then, we need to select the Predictor (x)
and Response (y)
variables on our plot using the dropdown menus.
To apply your selections and see results, click the Preview
button , which can be found at the top right of every dialog box or on the top bar on the Rguroo panel. The graph below is the scatterplot of girls versus year created using these steps.
Let’s say we want to visualize the above plot using a line graph instead. In the Scatterplot menu, you can select Line
under the Superimpose
part of the dialog box. This will connect the points consecutively by lines. If you would like to hide the points, click the button and select the Attributes of Scatterplot Points, LS Line, LOESS, and Identified Points
menu. Once open, check the Show Line
box. To remove the points, uncheck the Show Points
box, as shown in the figure below.
The default line is a solid orange line. You can select the line type and line color using the Line Type
and Color
options. For example, by clicking the colored box in the option Color
, as shown below, you can bring up a color palette and select a different color.
When you’re done customizing your line, you can again click the Preview
button to view your output.
Now, suppose we want to plot the total number of baptisms. To do basic calculations in Rguroo, click on Applets, then open the Calculators (Desmos) folder and select one of the calculators. Alternatively, you can find a link to a scientific calculator in the Quick Access Tools.
This brings up the selected calculator from https://www.desmos.com/ directly within Rguroo. Use the calculator to compute the total number of baptisms in 1629 by adding 5218 + 4683
. We could repeat this once for each year, but there is a faster way. You can add a new variable to your dataset (data frame) that includes the sums for each row.
We’ll want to compute the total number of baptisms for every year and use it to generate some plots, so we’ll want to save it as a permanent column in our dataset.
To modify our dataset, go back to the Data toolbox, select the Functions menu, and select Transform. As usual, the first thing to select in the dialog is the appropriate Dataset
. Select the arbuthnot dataset.
In the Variable
section to the left, click the green plus sign to add a new variable to the dataset. The default name for the variable will be something uninformative like Transform_1. Change the name to total. In the Returned Variable
section on the right, you should see total added to the list of variables.
In the middle section, type boys + girls
. This will ask Rguroo to add up the boys and girls counts for each row of our dataset (i.e., for each year) and record that sum as the value of the new variable total.
As usual, click the Preview
button to view the output. You’ll see that there is now a new column called total that has been tacked onto the first column of the data frame. You can rearrange the variables by dragging and dropping their names in the Returned Variable
column to your desired position. For example, you can move the newly calculated variable total to the last column.
Tip: To avoid typos in typing variable names in the Transform dialog box, especially if you have long variable names, you can double-click a variable name in the Returned Variable
column to add it to the center box.
A note on Transform function: If you become familiar with the programming language R, you can write multiple lines of R code in the Transform dialog box.
Now, to be able to create plots with this new dataset, we need to save it to Rguroo. First, type a name in the in the Save As… box, then click the button to save it. In the screenshot below, we have named the new dataset arbuthnot_total.
Note that since the Save Parameters box is checked, two things will happen. First, a new dataset will be added to the Data toolbox with a different icon. This icon indicates that when you open the dataset, you can also open the Transform dialog that created it. Second, the original arbuthnot dataset name in the Data toolbox is now shown with a bold-face green font. A dataset name that has the bold-face green format indicates that one or more Rguroo objects have been saved using this dataset. To see the datasets, plots, and reports that have been created from this dataset, right-click the dataset and select the option Show Dependencies.
Tip: If you want to save a dataset created in the Transform function as an independent dataset, uncheck the Save Parameters box. An independent dataset is one whose origin is not recorded in Rguroo. To remember information about a dataset, you can right-click on its name and add a Comment.
You can make a line plot of the total number of baptisms per year by opening a new Scatterplot dialog (in the Plots toolbox). Remember to open the menu to remove the points and add the line! The following figures should help.
In a similar way that you computed the total number of births, you can use the calculator to compute the ratio of the number of boys to the number of girls baptized in 1629:
5218 / 4683
Or the proportion of newborns that are boys in 1629:
5218 / (5218 + 4683)
However, to do this for all years simultaneously and append it to the dataset, let’s open the arbuthnot_total dataset again. Remember that since the Save Parameters box was checked, we can open up the dialog to see the original work (transformation) that we did and modify or add new variables to the dataset.
Let’s add two more variables, boy_to_girl_ratio and boy_proportion. Then, we’ll highlight each new variable on the left side and use the center box to type the formula used to compute the new variable.
Note that we are using the total variable we created from the original transformation.
Tip: If you are creating a variable that would be simpler to write in a few lines, you can write multiple lines in the formula box. For example, to obtain the proportion of boys, you can click on the plus sign , name your variable, and write the following two lines in the formula box:
= boys + girls
total = boys/total boy_proportion
Note that the values of the last variable in the formula box, in this example “boy_proportion,” will be assigned to your named variable.
Finally, in addition to simple mathematical operators like subtraction and division, we can use comparisons like greater than, >
, less than, <
, and equality, ==
in the center box. For example, we can ask if the number of births of boys outnumber that of girls in each year by adding another variable to our list, more_boys, using the code boys > girls
in the center box:
The variable more_boys will have the value TRUE
if that year had more boys than girls and FALSE
if that year did not (the answer may surprise you). This variable contains a different kind of data than we have encountered so far. All other columns in the arbuthnot
data frame have values that are numerical (the year, the number of boys and girls). Here, we’ve asked Rguroo to create logical data, data where the values are either TRUE
or FALSE
. In general, data analysis will involve many different kinds of data types, and one reason for using Rguroo is that it is able to represent and compute with many of them.
To get summary information about a dataset, you can right-click on the dataset name, and in the menu that appears, select Dataset Summary. Then a table containing various information about the dataset is shown. This table includes information about the minimum and maximum values, number of observations, etc. In the upcoming labs, we will talk in more detail about summary statistics.
In the earlier part of this lab, you recreated some of the displays and preliminary analysis of Arbuthnot’s baptism data. Your assignment involves repeating these steps, but for present day birth records in the United States. The data are stored in a data frame called present
and are available in the Rguroo’s Dataset Repository under Open Intro
. These data come from reports by the Centers for Disease Control. You can learn more about them by clicking the the information icon to the right of the dataset name. Once you’ve finished reading the information, you can click Import
to import the data into your account.
What years are included in this data set? What are the dimensions of the data frame? What are the variable (column) names?
How do these counts compare to Arbuthnot’s? Are they of a similar magnitude?
Make a plot that displays the proportion of boys born over time. What do you see? Does Arbuthnot’s observation about boys being born in greater proportion than girls hold up in the U.S.? Include the plot in your response. Hint: You should be able to Transform the present dataset using the steps learned in the previous section.
In what year did we see the most total number of births in the U.S.? Hint: First, use the Transform dialog to add a total
variable to the dataset. Then, sort your dataset in descending order based on the total variable. You can do this using the Sort dialog (in the Data toolbox, click Functions, then Sort) as shown in the screenshot below, or you can open the dataset in the Dataset Editor and click on the header of the variable that you want to sort, as shown earlier.
Do you know another way to sort the column total in Rguroo’s Data Viewer?
That was a short introduction to Rguroo, but we will provide you with more functions and a better sense of how to use Rguroo as the course progresses.
You can get help in how to use Rguroo in several ways.
?
that are associated with a section of the dialog. You can click on the question mark to bring up a help dialog explaining what each component of that section is and how to fill it in. For example, in the screenshot below, you can click the question-mark ?
for the Variable
section to learn what should be selected in the Predictor (x)
, Response (y)
, and/or Factor
dropdown menus. Moreover, at the top left of many dialogs, there is a video camera icon. You can click this icon to bring up a video tutorial on how to use the specific function and fill out the dialog.You can click on the icon that appears on top of the Rguroo application to get quick instructions on how to perform various tasks in Rguroo. You can also access the Quick User’s Guide through the Rguroo website under the Resources
tab.
The top right of the workspace (by Settings) contains two options for getting general help, as shown in the figure below. You can click the video camera icon to bring up a searchable list of tutorial videos or the book icon to open the Rguroo User’s Guide in a separate tab. The Rguroo User’s Guide contains many examples and even some formulas on how the numbers in Rguroo are calculated.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Rguroo.com, the Rguroo.com logo, and all other trademarks, service marks, graphics and logos used in connection with Rguroo.com or the Website are trademarks or registered trademarks of Soflytics Corp. in the USA and other countries and are not included under the CC-BY-SA license.