Exploring Environmental Data

16 January 2019

Indexing and accessing data

To access data in a data frame we can use $ or []

The $ notation let’s you specify the column (variable) to access in the data frame, using the columns name.

The [] let’s you specify the indices (row and column positions) that you would like to access

Indexing and accessing data

Let’s take the mpg data frame that is built into tidyverse as an example
manufacturer model displ year cyl trans drv cty hwy fl class
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
audi a4 2.8 1999 6 manual(m5) f 18 26 p compact

We see that the table has 11 columns (variables) and many rows (I’m just showing the first 6).

Using $ notation, I can access the all of the rows in the year column by typing mpg$year

So the syntax is dataframe$variable

Indexing and accessing data

manufacturer model displ year cyl trans drv cty hwy fl class
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
audi a4 2.8 1999 6 manual(m5) f 18 26 p compact

Using [] notation, I can access all of the rows in the year column by typing mpg[ ,4]

For instance, I could get rows 5 through 10 of column 4 by typing mpg[5:10, 4]

You can use $ notation along with [] notation. Thus, mpg$year[2:20] accesses rows 2 through 20 of the year column (variable) in the mpg data frame.

Indexing and accessing data

Recall that you can use logical operations to access data in an object.

For instance, I can access data from the variable hwy in the data frame mpg, for entries that have a year value equal to 1999 by doing:

mpg$hwy[mpg$year == 1999]

Pay attention to the syntax above.

The mpg$hwy says, I want to access data in the hwy variable of the mpg data frame.

Then within the brackets [], I specify the conditions used to select the correct rows in the hwy variable. Since I want to select rows of hwy, for years equal to 1999 only, I write mpg$year == 1999.

Functions

A function is a piece of code that you to which you can pass input(s), it performs an operation(s) on those inputs and then passes back output to you.

INPUT -> FUNCTION -> OUTPUT

You are all familiar with the mean() function, which computes the mean (average) of a set of numbers. This is handy because we often want to perform this operation and it would be time-consuming and highly error-prone to type out the code to carry out the operations needed to compute the mean every time we wanted to.

It is VERY IMPORTANT to note that we send inputs to a function using (). We do NOT use [], which are used to access values within a data object.

Correction to 2019_01_14 Lecture

A more correct version of the if/else if/else code block is below. The code I had in the print out works, but had unneeded statements in the else if portions

city_temp <- ... # type the temperature here

if(city_temp >= 85){
  print("Wow it's pretty hot out!")
} else if(city_temp >= 50){
  print("The temperature is nice and comfortable")
} else if (city_temp >= 32){
  print("It's pretty cold outside")
} else {
  print("It's freezing out!")
}

Troubleshooting in R/RStudio

If you have issues while coding in RStudio a few common pitfalls are: