Today’s lab will consist of two parts. First we will make sure that you are properly set up with all of the software and accounts that you need for this class. Next you will work with some fuel economy data from the Environmental Protection Agency (EPA) to learn more about fuel efficiency in automobiles – which is an important from an environmental/climate perspective given that vehicles are responsible for 51% of CO2 emissions for a typical U.S. household1.
As a reminder I want you to talk to your neighbors and discuss the material and your thoughts. You’ll learn from each other and have much more fun this way too.
I will go over the guidelines for lab report expectations including the content, structure, and formatting of your reports. Remember that you can find additional details on lab guidelines and goals on our class website.
Automobiles are a significant contributor to both CO2 emissions and air pollution. Understanding fuel efficiency in vehicles and the key factors that affect the efficiency are important from both an environmental and economic perspective. Today you will use EPA vehicle fuel economy data to learn more about this issue and explore some interesting questions.
The following resources will provide some helpful background related to today’s lab. This resources will also be helpful as you are writing up your lab report.
We are going to load in the tidyverse
package. If you
haven’t installed it yet, you will first need to do that (it only needs
to be installed once and once installed can be loaded in anytime you
want). To install tidyverse
go to the Packages
tab and click the install
button. In the window that pops
up, type tidyverse
and click Install
.
library(tidyverse)
The data is conveniently part of one of the packages included with
tidyverse, so we can easily load in the data, which is called
mpg
.
We’ll assign the mpg
data to a new object that we’ll
call fuel_data
. However, before you run the code block
below, you should first learn a bit more about the mpg
dataset.
To do that type ?mpg
in your Console
(we’ll use the console, because we only want to get this info once, and
not every single time we run our Notebook).
fuel_data <- mpg # assigning mpg to a new object
View()
function to see the dataset in a
spreadsheet-style viewer.This can be really helpful when you are trying to get familiar with a dataset that you’ve loaded in.
Let’s type View(fuel_data)
in the
Console (not in our Notebook) since we just want to do
this once (and not every single time we run our Notebook).
str()
function to examine the structure of the
fuel_data
dataset# Your code here
chr
, int
, and num
mean in the output above?fuel_data
dataset. Apply this function to your
fuel_data
object and it will print these results to your
notebook.# Your code here
disp
, cyl
and cty
columns represent? (Hint: type ?mpg
into your console to
get more info on this dataset)fuel_data
dataset? There is a function that will give you the dimensions of an
object (i.e. number of rows and columns). Look at your
Base R Cheatsheet to find this function.# Your code here
n_rows
and
number of columns to n_cols
(Hint: you can find functions
for just the number of rows and just number of columns respectively on
your Base R Cheatsheet)summary()
function to get summary statistics on
the fuel_data
dataset# Your code here
hwy
to
cty
fuel economyTo create a new variable you can use the mutate()
function (which is included in the dplyr
package that loads
in with tidyverse
). You can get more info on
mutate()
by typing ?mutate()
to your console
or by Googling it. FYI, we are going to learn a lot more about
dplyr
later in the term`
You can also create a new variable by using the $
notation to assign it. The $
allows us to access (or
create) a new variable in a data frame. For instance
fuel_data$new_variable_name <- expression that defines the variable
I’ve shown you how to do it both ways in the code below. You can pick one (they do the same thing) and uncomment it so that it will run.
## Use mutate() to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data <- mutate(fuel_data, hwy2cty = hwy/cty)
## Use the $ notation to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data$hwy2cty <- fuel_data$hwy/fuel_data$cty
hwy2cty
and spend a few minutes discussing with your
neighbor(s) what the code is doing.Let’s see how the size of a vehicle’s engine (displacement) influences the fuel economy
Use ggplot
to create a scatter plot of hwy
vs displ
. Remove the #
and replace the
...
with the appropriate values.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ...))
Let’s add another variable to our analysis to try and further explain what influences fuel economy. We’ll now color the points by their vehicle class.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ... , color = ... ))
fuel_data
Below I’ve added a new variable region
to
fuel_data
using the mutate()
function. This
step is relatively advanced, so don’t worry if it looks well beyond what
we’ve covered so far. However, I want you to take a look at the code and
try to decipher what I did here. I tried to use variable names that are
logical and if you see a function that you don’t understand, try looking
at the help file and/or Googling it.
us_makes <- c("chevrolet","dodge","ford","jeep",
"lincoln","mercury","pontiac") # list of U.S. manufacturers
fuel_data <- mutate(fuel_data, region = if_else(manufacturer %in% us_makes,"US","Foreign") )
region
telling us?region
?You may find region
to be another interesting variable
to examine in the next section.
With any remaining time you should perform further exploratory
analysis of the fuel_data
dataset. You should discuss ideas
with your classmates and you can also ask me for guidance (but try to
brainstorm some ideas before asking). I will be moving about the
classroom and checking in with everyone, providing guidance and
suggestions, and hearing what you’ve learned about the dataset being
analyzed.
Note that this section of the lab is important and should not be given only a cursory work through. An important learning goal for this term is for you to develop your own independent research skill. Thus, all of our labs this term will have a large, independent component where you are expected to apply what you’ve learned to ask novel and interesting questions and furthermore to try and go beyond what you’ve learned in class.
When trying to get started with something new, remember the copy/paste/tweak approach. This approach can help give you a good starting point for your work.
If you are interested in further exploring EPA fuel efficiency data you can download the full EPA dataset here. The description of each of the variables in the dataset is available here.
This dataset has information on thousands of vehicles and their fuel/energy efficiency (along with dozens of other related variables) for models from 1984-2022. Notably this dataset contains information on many electric and hybrid vehicles.
Unlike the mpg
dataset you used in Part
1, the full EPA dataset is quite a bit “messier”
(e.g. variables have missing data in some rows) and quite a bit more
complex (upwards of 80 variables are reported for a given vehicle).
However, the larger number of observations (i.e., vehicles) and
variables (i.e., vehicle attributes) allows for a much richer
dataset.
If you decide to work on this analysis let me know and I can help guide you on how to load in and get started with the data.
Your lab is due prior to the start of next week’s lab. Once you are
finished and satisfied with your work you should knit
it to
an html
file and submit both your html
and
Rmd
file to Nexus.
To knit
a file you can go to the menu bar at the top of
your notebook and click the dropdown that currently says
preview
and select the knit to html
option.
This will knit
your document, which runs all of your code
and generates a nice report in html format (the file is saved in your
current working directory).
An even easier way to knit
your file is to go to the
header at the top of your document and change html_notebook
to html_document
and then save your file. You will then see
that the Preview
option in the menu bar will have changed
to Knit
. Click Knit
and your report will be
knit
.
Before you leave lab today make sure you know how to knit your document
Make sure your file is properly named BEFORE you submit
it The correct naming structure is
LabName_YourLastName
Make sure to replace the author
and
date
in the header with your info.