Today’s lab will consist of two parts. First we will make sure that you are properly set up with all of the software and accounts that you need for this class. Next you will work with some fuel economy data from the Environmental Protection Agency (EPA) to learn more about fuel efficiency in automobiles – which is an important from an environmental/climate perspective given that vehicles are responsible for 51% of CO2 emissions for a typical U.S. household1.
As a reminder I want you to talk to your neighbors and discuss the material and your thoughts. You’ll learn from each other and have much more fun this way too.
I will go over the guidelines for lab report expectations including the content, structure, and formatting of your reports. Remember that you can find additional details on lab guidelines and goals on our class website.
Automobiles are a significant contributor to both CO2 emissions and air pollution. Understanding fuel efficiency in vehicles and the key factors that affect the efficiency are important from both an environmental and economic perspective. Today you will use EPA vehicle fuel economy data to learn more about this issue and explore some interesting questions.
The following resources will provide some helpful background related to today’s lab. This resources will also be helpful as you are writing up your lab report.
We are going to load in the tidyverse
package. If you haven’t installed it yet, you will first need to do that (it only needs to be installed once and once installed can be loaded in anytime you want). To install tidyverse
go to the Packages
tab and click the install
button. In the window that pops up, type tidyverse
and click Install
.
library(tidyverse)
The data is conveniently part of one of the packages included with tidyverse, so we can easily load in the data, which is called mpg
.
We’ll assign the mpg
data to a new object that we’ll call fuel_data
. However, before you run the code block below, you should first learn a bit more about the mpg
dataset.
To do that type ?mpg
in your Console (we’ll use the console, because we only want to get this info once, and not every single time we run our Notebook).
fuel_data <- mpg # assigning mpg to a new object
View()
function to see the dataset in a spreadsheet-style viewer.This can be really helpful when you are trying to get familiar with a dataset that you’ve loaded in.
Let’s type View(fuel_data)
in the Console (not in our Notebook) since we just want to do this once (and not every single time we run our Notebook).
str()
function to examine the structure of the fuel_data
dataset# Your code here
chr
, int
, and num
mean in the output above?fuel_data
dataset. Apply this function to your fuel_data
object and it will print these results to your notebook.# Your code here
disp
, cyl
and cty
columns represent? (Hint: type ?mpg
into your console to get more info on this dataset)fuel_data
dataset? There is a function that will give you the dimensions of an object (i.e. number of rows and columns). Look at your Base R Cheatsheet to find this function.# Your code here
n_rows
and number of columns to n_cols
(Hint: you can find functions for just the number of rows and just number of columns respectively on your Base R Cheatsheet)summary()
function to get summary statistics on the fuel_data
dataset# Your code here
hwy
to cty
fuel economyTo create a new variable you can use the mutate()
function (which is included in the dplyr
package that loads in with tidyverse
). You can get more info on mutate()
by typing ?mutate()
to your console or by Googling it. FYI, we are going to learn a lot more about dplyr
later in the term`
You can also create a new variable by using the $
notation to assign it. The $
allows us to access (or create) a new variable in a data frame. For instance fuel_data$new_variable_name <- expression that defines the variable
I’ve shown you how to do it both ways in the code below. You can pick one (they do the same thing) and uncomment it so that it will run.
## Use mutate() to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data <- mutate(fuel_data, hwy2cty = hwy/cty)
## Use the $ notation to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data$hwy2cty <- fuel_data$hwy/fuel_data$cty
hwy2cty
and spend a few minutes discussing with your neighbor(s) what the code is doing.Let’s see how the size of a vehicle’s engine (displacement) influences the fuel economy
Use ggplot
to create a scatter plot of hwy
vs displ
. Remove the #
and replace the ...
with the appropriate values.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ...))
Let’s add another variable to our analysis to try and further explain what influences fuel economy. We’ll now color the points by their vehicle class.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ... , color = ... ))
fuel_data
Below I’ve added a new variable region
to fuel_data
using the mutate()
function. This step is relatively advanced, so don’t worry if it looks well beyond what we’ve covered so far. However, I want you to take a look at the code and try to decipher what I did here. I tried to use variable names that are logical and if you see a function that you don’t understand, try looking at the help file and/or Googling it.
us_makes <- c("chevrolet","dodge","ford","jeep",
"lincoln","mercury","pontiac") # list of U.S. manufacturers
fuel_data <- mutate(fuel_data, region = if_else(manufacturer %in% us_makes,"US","Foreign") )
region
telling us?region
?You may find region
to be another interesting variable to examine in the next section.
With any remaining time you should perform further exploratory analysis of the fuel_data
dataset. You should discuss ideas with your classmates and you can also ask me for guidance (but try to brainstorm some ideas before asking). I will be moving about the classroom and checking in with everyone, providing guidance and suggestions, and hearing what you’ve learned about the dataset being analyzed.
Note that this section of the lab is important and should not be given only a cursory work through. An important learning goal for this term is for you to develop your own independent research skill. Thus, all of our labs this term will have a large, independent component where you are expected to apply what you’ve learned to ask novel and interesting questions and furthermore to try and go beyond what you’ve learned in class.
When trying to get started with something new, remember the copy/paste/tweak approach. This approach can help give you a good starting point for your work.
If you are interested in further exploring EPA fuel efficiency data you can download the full EPA dataset here. The description of each of the variables in the dataset is available here.
This dataset has information on thousands of vehicles and their fuel/energy efficiency (along with dozens of other related variables) for models from 1984-2022. Notably this dataset contains information on many electric and hybrid vehicles.
Unlike the mpg
dataset you used in Part 1, the full EPA dataset is quite a bit “messier” (e.g. variables have missing data in some rows) and quite a bit more complex (upwards of 80 variables are reported for a given vehicle). However, the larger number of observations (i.e., vehicles) and variables (i.e., vehicle attributes) allows for a much richer dataset.
If you decide to work on this analysis let me know and I can help guide you on how to load in and get started with the data.
Your lab is due prior to the start of next week’s lab. Once you are finished and satisfied with your work you should knit
it to an html
file and submit both your html
and Rmd
file to Nexus.
To knit
a file you can go to the menu bar at the top of your notebook and click the dropdown that currently says preview
and select the knit to html
option. This will knit
your document, which runs all of your code and generates a nice report in html format (the file is saved in your current working directory).
An even easier way to knit
your file is to go to the header at the top of your document and change html_notebook
to html_document
and then save your file. You will then see that the Preview
option in the menu bar will have changed to Knit
. Click Knit
and your report will be knit
.
Before you leave lab today make sure you know how to knit your document
Make sure your file is properly named BEFORE you submit it The correct naming structure is LabName_YourLastName
Make sure to replace the author
and date
in the header with your info.