At this point in the term, you’ve learned quite a bit of material and its good to stop and look back at what we’ve covered. In today’s lecture we will refresh our memory of the concepts and tools that we’ve seen up to now. In the sections below you’ll work through excercises that cover many of these key topics.
Let’s load in the libraries we’ll use in this lesson
library(tidyverse)
At the beginning of the term we learned some of the basics of programming in R. You are continually using many of these concepts so they should be relatively fresh. However, below are some questions/exercises that should help to reinforce these concepts.
Name the common data types used in R
Name the common data structures used in R
by_fives
that goes from 0 to 50 in increments of 5
by_fives
by your new vector created in the above step.head()
function and get a quick summary using the summary()
function. Also examine the structure using the str()
function[]
notation$
notation.earthquake_data <- read_csv("https://stahlm.github.io/ENS_215/Data/Rocky_Mtn_Arsenal_Earthquakes.csv", skip = 2)
new_vec
. Test the following conditions (element-wise) on your new_vec
vector
We learned to use Markdown to nicely format our R Notebooks. The following exercises will refresh your memory on some of these formatting options. You can refer to the Notebooks posted on our class website and/or your R Markdown Cheatsheet.
#
sAs you’ve learned conditional programming allows us execute code when specified conditions are met. We learned how to do this using if
, if/else
, and if/else-if/else
statements.
rand_number <- runif(1, min = 0, max = 100) # generate a random number between 1 and 100
my_guess <- # your guess goes here
if/else-if/else
statement that tells you how well you guessed
Create your if/else-if/else
statement in a well-thought out and efficient manner. Think about the styling of your code and the quality of your implementation.
We learned that we can repeat a section of code when specified conditions are met by using loops. This allows us to perform repeated operations without having to copy and paste code (which is a very bad practice and very inefficient).
Let’s load in a some daily streamflow on the Hudson River (measured near Waterford, NY) for years 2013-2016. Note that the dataset is complete (i.e. there are no missing days and no missing data)
Hudson_flow <- read_csv("https://stahlm.github.io/ENS_215/Data/Hudson_01335754_review_class.csv")
Note: Take some time to think about how to do this. Also write you code in an intelligent manner so that it is flexible (i.e. would run without modification if you were to load in different but identically formatted dataset).
# Your code here
while
loop that loops through the Hudson_flow
data until it reaches the maximum flow recorded in the dataset at which point the loop stops. You should add a print()
statement after the loop that reports the date of the maximum flow. FYI, I get the following answer## [1] "Max flow occurs on 2014-4-16"
We learned tons of ways to wrangle data using the dplyr
package. Let’s refresh our skills with these tools (you’ll likely be pretty fresh with these concepts since have been using them heavily).
To practice your skills you should use the Hudson_flow
data. Don’t overwrite your Hudson_flow
dataset when making modifications. If you happen to do this by accident, you can simply reload in the data.
filter()
to select only the rows with flows > 7500 cfsfilter()
to select only the rows with: 2,500 < flows < 12,000 cfsfilter()
to select only the rows with months Nov, Dec, Jan, Feb (you should use %in%
in your filter operation)%>%
to allow you to do this in a single line of codeselect()
functiondplyr
Let’s practice some of additional (and more advanced) data wrangling skills
Hudson_flow
with a categorical variable that categorizes flow into “Low flow” and “High flow” based on the following conditions
You will want to use mutate()
and if_else()
to accomplish the above. Make sure to reassign the Hudson_flow
object so that you carry this variable with you in the later analysis
group_by()
and summarize()
to accomplish the following tasks
This topics is very recent so not much need to refresh your memory, but should still do some excercises to reinforce the concepts.
Let’s generate some graphics using the tools we’ve learned in the ggplot2
package. We’ll use the Hudson_flow
data in the exercises below.
geom_jitter()
(Hint: you may want to convert your x variable to a factor).
geom_point
layer with the mean flow for each month (i.e. twelve points). Make these points blue squares.theme_classic()
iiI) Add axis labels, a title, and a caption iV) Set the alpha of the points to 0.5Note: You will need to load in the scales
package for the comma formatting
library(scales)
We just did this last class so we won’t review this topic today, though you should look back at the past few lectures if you need a refresher.