Exploring Environmental Data

18 January 2019

Loops (quick refresher)

for (i in 1:10){
  
  j <- i^2
  print(j)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36
## [1] 49
## [1] 64
## [1] 81
## [1] 100

In the above loop, i goes from 1 to 10, starting at 1 and increasing by 1 with each loop.

Loops (quick refresher)

city_list <- c("Schenectady", "New York", "Boston", "Chicago", "Miami")

for (i_city in city_list){
  print(i_city)
}
## [1] "Schenectady"
## [1] "New York"
## [1] "Boston"
## [1] "Chicago"
## [1] "Miami"

In the above loop, the object i_city takes the value of the values in city_list.


Loops (quick refresher)

Using a different approach, I can create a loop that does the exact same thing as the loop on the previous slide

city_list <- c("Schenectady", "New York", "Boston", "Chicago", "Miami")

for (i_city in 1:5){
  print(city_list[i_city])
}
## [1] "Schenectady"
## [1] "New York"
## [1] "Boston"
## [1] "Chicago"
## [1] "Miami"

Notice how in this loop i_city goes from 1 to 5, increasing with each loop.

Thus I can use i_city to specify the index within the city_list object that I would like to access on each loop.

Introduction to Data Manipulation

Recall the data science workflow I showed on the first day of class.
Image source: R4DS

Now that we’ve established foundational skills in R programming, we are going to move into the data manipulation (transform) stage of the workflow.

Introduction to Data Manipulation

Why is this so important?

Data sets are often large and often contain tens or hundreds of variables and tens of thousands of observations. When conducting our analysis we often need to:

In many cases these operations are central to our analysis of interest.

Introduction to Data Manipulation

At this point we’ve learned some basics of data manipulation using the functionality available in base R.

Today we are going to begin taking these skills to the next level.

We’ll do this by using an amazing package called dplyr

Introduction to Data Manipulation

The dplyr package is included in the tidyverse collection of packages, so you should already have it installed on your computer.

With dplyr we’ll be able to manipulate data using the functions included in the package. This will make doing the types of data manipulation we’ve done thus far much, much easier (both to code and to read/understand).

We will also gain a whole new set of data manipulation operations.

With dplyr we will be able to seamlessly deal with large and complex data.

Introduction to Data Manipulation

The dplyr package gives us a grammar of data manipulation. The package provides the verbs (functions) for many common data manipulation tasks and which can act on our subjects (datasets).

We’ll learn a number of key dplyr verbs, including: