Today’s lab will consist of two parts. First we will make sure that you are properly set up with all of the software and accounts that you need for this class. Next you will work with some fuel economy data from the Environmental Protection Agency (EPA) to learn more about fuel efficiency in automobiles – which is an important from an environmental/climate perspective given that vehicles are responsible for 51% of CO2 emissions for a typical U.S. household1.
As a reminder I want you to talk to your neighbors and discuss the material and your thoughts. You’ll learn from each other and have much more fun this way too.
I will go over the guidelines for lab report expectations including the content, structure, and formatting of your reports. Remember that you can find additional details on lab guidelines and goals on our class website.
Automobiles are a significant contributor to both CO2 emissions and air pollution. Understanding fuel efficiency in vehicles and the key factors that affect the efficiency are important from both an environmental and economic perspective. Today you will use EPA vehicle fuel economy data to learn more about this issue and explore some interesting questions.
The following resources will provide some helpful background related to today’s lab. This resources will also be helpful as you are writing up your lab report.
We are going to load in the tidyverse package. If you
haven’t installed it yet, you will first need to do that (it only needs
to be installed once and once installed can be loaded in anytime you
want). To install tidyverse go to the Packages
tab and click the install button. In the window that pops
up, type tidyverse and click Install.
library(tidyverse)The data is conveniently part of one of the packages included with
tidyverse, so we can easily load in the data, which is called
mpg.
We’ll assign the mpg data to a new object that we’ll
call fuel_data. However, before you run the code block
below, you should first learn a bit more about the mpg
dataset.
To do that type ?mpg in your Console
(we’ll use the console, because we only want to get this info once, and
not every single time we run our Notebook).
fuel_data <- mpg  # assigning mpg to a new objectView() function to see the dataset in a
spreadsheet-style viewer.This can be really helpful when you are trying to get familiar with a dataset that you’ve loaded in.
Let’s type View(fuel_data) in the
Console (not in our Notebook) since we just want to do
this once (and not every single time we run our Notebook).
str() function to examine the structure of the
fuel_data dataset# Your code herechr, int, and num
mean in the output above?fuel_data dataset. Apply this function to your
fuel_data object and it will print these results to your
notebook.# Your code heredisp, cyl and cty
columns represent? (Hint: type ?mpg into your console to
get more info on this dataset)fuel_data
dataset? There is a function that will give you the dimensions of an
object (i.e. number of rows and columns). Look at your
Base R Cheatsheet to find this function.# Your code heren_rows and
number of columns to n_cols (Hint: you can find functions
for just the number of rows and just number of columns respectively on
your Base R Cheatsheet)summary() function to get summary statistics on
the fuel_data dataset# Your code herehwy to
cty fuel economyTo create a new variable you can use the mutate()
function (which is included in the dplyr package that loads
in with tidyverse). You can get more info on
mutate() by typing ?mutate() to your console
or by Googling it. FYI, we are going to learn a lot more about
dplyr later in the term`
You can also create a new variable by using the $
notation to assign it. The $ allows us to access (or
create) a new variable in a data frame. For instance
fuel_data$new_variable_name <- expression that defines the variable
I’ve shown you how to do it both ways in the code below. You can pick one (they do the same thing) and uncomment it so that it will run.
## Use mutate() to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data <- mutate(fuel_data, hwy2cty = hwy/cty)
## Use the $ notation to create the new variable (hwy2cty) and add it to the fuel_data object
# fuel_data$hwy2cty <- fuel_data$hwy/fuel_data$ctyhwy2cty and spend a few minutes discussing with your
neighbor(s) what the code is doing.Let’s see how the size of a vehicle’s engine (displacement) influences the fuel economy
Use ggplot to create a scatter plot of hwy
vs displ. Remove the # and replace the
... with the appropriate values.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ...))Let’s add another variable to our analysis to try and further explain what influences fuel economy. We’ll now color the points by their vehicle class.
# ggplot(data = ...) + geom_point(aes(x = ... , y = ... , color = ... ))fuel_dataBelow I’ve added a new variable region to
fuel_data using the mutate() function. This
step is relatively advanced, so don’t worry if it looks well beyond what
we’ve covered so far. However, I want you to take a look at the code and
try to decipher what I did here. I tried to use variable names that are
logical and if you see a function that you don’t understand, try looking
at the help file and/or Googling it.
us_makes <- c("chevrolet","dodge","ford","jeep",
              "lincoln","mercury","pontiac") # list of U.S. manufacturers
fuel_data <- mutate(fuel_data, region = if_else(manufacturer %in% us_makes,"US","Foreign") )region telling us?region?You may find region to be another interesting variable
to examine in the next section.
With any remaining time you should perform further exploratory
analysis of the fuel_data dataset. You should discuss ideas
with your classmates and you can also ask me for guidance (but try to
brainstorm some ideas before asking). I will be moving about the
classroom and checking in with everyone, providing guidance and
suggestions, and hearing what you’ve learned about the dataset being
analyzed.
Note that this section of the lab is important and should not be given only a cursory work through. An important learning goal for this term is for you to develop your own independent research skill. Thus, all of our labs this term will have a large, independent component where you are expected to apply what you’ve learned to ask novel and interesting questions and furthermore to try and go beyond what you’ve learned in class.
When trying to get started with something new, remember the copy/paste/tweak approach. This approach can help give you a good starting point for your work.
If you are interested in further exploring EPA fuel efficiency data you can download the full EPA dataset here. The description of each of the variables in the dataset is available here.
This dataset has information on thousands of vehicles and their fuel/energy efficiency (along with dozens of other related variables) for models from 1984-2022. Notably this dataset contains information on many electric and hybrid vehicles.
Unlike the mpg dataset you used in Part
1, the full EPA dataset is quite a bit “messier”
(e.g. variables have missing data in some rows) and quite a bit more
complex (upwards of 80 variables are reported for a given vehicle).
However, the larger number of observations (i.e., vehicles) and
variables (i.e., vehicle attributes) allows for a much richer
dataset.
If you decide to work on this analysis let me know and I can help guide you on how to load in and get started with the data.
Your lab is due prior to the start of next week’s lab. Once you are
finished and satisfied with your work you should knit it to
an html file and submit both your html and
Rmd file to Nexus.
To knit a file you can go to the menu bar at the top of
your notebook and click the dropdown that currently says
preview and select the knit to html option.
This will knit your document, which runs all of your code
and generates a nice report in html format (the file is saved in your
current working directory).
An even easier way to knit your file is to go to the
header at the top of your document and change html_notebook
to html_document and then save your file. You will then see
that the Preview option in the menu bar will have changed
to Knit. Click Knit and your report will be
knit.
Before you leave lab today make sure you know how to knit your document
Make sure your file is properly named BEFORE you submit
it The correct naming structure is
LabName_YourLastName
Make sure to replace the author and
date in the header with your info.