The R Notebooks that we work in allow us to incorporate text, code, and output all in one place. This is a huge benefit when you want to create a report from your work in R. R Notebooks are excellent for producing computationally reproducible research.
An R Notebooks is technically an R Markdown file (.Rmd)
. This means that your R Notebook is a blend of R (which is the code portion of your Notebook) and Markdown (the text portion). Markdown is simply a system for formatting document features (e.g. text, margins, bullets, table of contents,…).
There are ton’s of formatting options you can specify when working in an R Notebook and this allows us to create attractive and easy to read documents. We’ll learn a few basics today that will greatly improve how your R Notebooks look when you output your reports.
The R Markdown Cheatsheet that I handed out has examples similar to below, as well as more advanced topics that we won’t cover today.
All of your R Notebooks have a file header (also called a YAML Header). This is required for specifying how your file will look and what format it will be output to when you generate your reports.
The header is at the top of the Notebook and has three dashes ---
at the top and bottom.
Here’s the header that I used on this current Notebook. These settings specified how I want my file to look when it is rendered to a report.
---
title: "R Programming and Markdown Basics"
author: "ENS-215"
date: "07-Jan-2022"
output:
html_notebook:
theme: spacelab
toc: TRUE
---
The html_notebook
setting allows us to instantaneously view any changes made by clicking Preview or simply saving the Notebook. Note that changes to your code will appear when you Preview a Notebook, however have to run the code before previewing in order to see any changes to the output.
When you done and ready to generate your report you can Knit your document to an html file. There are two ways to do this:
To knit a file you can go to the menu bar at the top of your notebook and click the dropdown that currently says preview and select the knit to html option. This will knit your document, which runs all of your code and generates a nice report in html format (the file is saved in your current working directory).
An even easier way to knit your file is to go to the header at the top of your document and change html_notebook
to html_document
and then save your file. You will then see that the Preview option in the menu bar will have changed to Knit. Click Knit and your report will be knit.
As you already know, we can include R Code in our Notebooks. We can add code blocks by hitting Ctrl + Alt + i
(PC) or Cmd + Option + i
(Mac).
You can also generate a code block by typing ```{r}
on one line, then hitting Enter and typing ```
on the line below.
Give both of these approaches a try.
Example code block
x <- 10 # comments in a code block are created by putting the hashtag symbol before the comment
x + 5
## [1] 15
Remember you can run a code block by hitting Ctrl + Shift + Enter
(PC) or Cmd + Shift + Enter
(Mac). To see the other Run options you can click the Run dropdown button in the top right of your Editor window.
Since your R Notebook (R Markdown) file is essentially a plain text file (e.g. You can’t modify how the text looks in your editor like you can in Microsoft Word) you need to use special characters to specify how your text should be formatted in your output report.
Section headers/titles like the ones you see seperating the sections of this document are created by putting an #
at the start of a line of text. To create smaller section headers add more hashtags to the start of the line. For instance ##
will create a smaller section header and ###
would create and even smaller one.
Bold font is created by putting **
at the start and end of the section of text you want in bold. For instance you would type ** text I want in bold **
.
Italics are created with either _text inside is in italics_
or *text inside is in italics*
To make code show up verbatim you use put the ` symbol around the text your want to appear as verbatim. Note that the symbol is NOT the single quote but is the symbol that appears to the left of the 1 on your keyboard.
Superscripts such as X2 are done with the ^superscripted text here^
. So X2, is created by X^2^
Subscripts such as Xi are done with ~subscripted text here~
. So Xi, is created by X~i~
I can also create bulleted lists using the +
at the start of a line.
I can create numbered lists by typing something like this
1. First item
2. Second item
i) sub-item
ii) another sub-item
And the list would look like this in my report.
2 + 1 #Add
## [1] 3
15 - 4 #Subtract
## [1] 11
9 * 2 #Multiply
## [1] 18
3 ^ 4 #Exponents
## [1] 81
120 / 8 #Divide
## [1] 15
5 %% 2 #Modulus
## [1] 1
4 > 2 #Greater than
## [1] TRUE
2 < 5 #Less than
## [1] TRUE
5 <= 5 #Less than or equal
## [1] TRUE
8 >= 2 #Greater than or equal
## [1] TRUE
2 == 2 #Equality: notice that it is TWO equal signs!
## [1] TRUE
5 != 7 #Not Equals
## [1] TRUE
Note that when you run a code block it is sending the code the the console. You can also type code directly into the console and it will be evaluated. This can be handy for a quick one off calculation, however for running many operations we’ll stick to using an R notebook.
Typically we’ll be re-using the results from some calculation so we’ll want to assign it to a variable. In R we use <-
to assign values to objects So x <- 10
would mean that the object x is assigned a value of 10.
x <- 10 # assign a variable
# to print out the value of x to the console I can simply type out the variable on its own line of code
x
## [1] 10
y <- (2*x) + 5 # you can use mathematical operations and previously declared variables when assigning a new variable
y
## [1] 25
z <- x + y + 0.1234
z
## [1] 35.1234
Variables can take non-numeric values. The objects below take strings (i.e. text) as their values.
studentName_1 <- "Bob"
studentName_1
## [1] "Bob"
studentName_2 <- "Jess"
studentName_2
## [1] "Jess"
Notice how I gave the objects descriptive names. Also notice how I used a consistent naming format. You should be put thought into how you name objects This will make your code much easier to read and much faster to write.
Object names cannot begin with a number, contain spaces, or (most) special characters. You may use underscores and periods in object names. Also note that objects are case sensitive.
So if you have an object a
then typing out A
would NOT be referring to the object that your names a
.
Now take a look at your Environment tab. You’ll see all of the objects that we’ve assigned thus far. If you want to see all of the objects in your environment you use the ls()
function.
ls() # this prints out the names of all of the objects currently in my environment
## [1] "studentName_1" "studentName_2" "x" "y"
## [5] "z"
To remove an object from your workspace you can use the rm()
function
rm(x)
number_1
and number_2
and give them the values of 2.5 and 10, respectivelystring_1
and string_2
, give them any character string that you would like.number_1
, number_2
, and the power of math create an object called number_3
that equals 25string_2
string_1
and number_1
. What happens?Everything in R is an object. The data assigned to a given object can be categorized by its data type. Data can be organized into different structures and these structures can often accomodate a mix of different data types.
Any value stored in a data object can be characterized by its data type.
The basic data types in R are:
Example | Type |
---|---|
“a” “swc” | character |
2, 15.5 | numeric |
2L | integer |
TRUE, FALSE | logical |
1+4i | complex |
62 6f 62 | raw |
We will almost always be dealing with character, numeric, and logical data types in this class.
In many cases the data you deal with may have missing values or other issues. Values such as missing data NA
, not a number NaN
and infinity inf
will come up from time to time. We’ll learn techniques for dealing with these throughout the term.
Infinity can arise as such
1/0
## [1] Inf
Not a number can arise as follows
0/0
## [1] NaN
If you get stuck remember along with me and you classmates, Google can almost always point you in the right direction. Your textbook is also a great resource.
In addition to these resource R has built-in help files. Let’s practice with these.
To get help you type ?term_of_interest
in your console or in your Notebook (and then run the code block) and help will appear in the Help window to the right. For example
In your console get help for the na.omit()
function. Take a minute to look at the help file and understand what it is showing. All help files are similarly formatted.
Try getting help for another function that you are interested in.
Data can be stored in R as a number of different data structures. The structure that you chose to assign data to will depend on the features/characteristics of your data.
The data structures available in base R include:
To create a vector we use the c()
function
majors_vec <- c("Environmental Science","Geology","Chemistry")
num_vec <- c(1, 10, 5034.253, -1.045)
log_vec <- c(TRUE, FALSE, FALSE, TRUE, TRUE)
You can simply type the variable or use the print()
function to print out the vector’s contents
majors_vec
## [1] "Environmental Science" "Geology" "Chemistry"
print(log_vec)
## [1] TRUE FALSE FALSE TRUE TRUE
We can determine the properties of a vector using some helpful functions
length(log_vec) # vector length
## [1] 5
class(num_vec) # class
## [1] "numeric"
str(log_vec) # structure of the vector
## logi [1:5] TRUE FALSE FALSE TRUE TRUE
Missing data (represented by NA) are often encountered. Below are a few methods for dealing with them.
a_missing <- c(1,2,3,4,NA,5,6,NA,7,8,9) # create a vector that has some missing data
na.omit(a_missing) #na.omit - removes them
## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "omit"
a_missing
by using the sum()
function. Do you see any issues?In some cases we will want to remove missing data entries so that we can just examine the entries where we have values. Let’s remove the NAs from a_missing
and assign the new data to a new object called a_cleaned
a_cleaned <- na.omit(a_missing)
a_cleaned
. Does it look like everything worked?a_cleaned
.You can also use na.exclude()
to remove missing values
na.exclude(a_missing) #similar to omit, but has different behavior with some functions.
## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "exclude"
is.na()
will tell you which values in the object are NAs
is.na(a_missing) #Will tell you if a value is NA
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
We commonly need to create vectors of a sequence of numbers or repeated numbers. There are functions to speed this up.
Create a series
series_1 <- 1:10
series_2 <- seq(10)
series_3 <- seq(0, 10, by = 0.05)
Repeat values
n_reps <- 5
rep_val <- 10
many_tens <- rep(rep_val,n_reps)
print(many_tens)
## [1] 10 10 10 10 10
Look at the above code and understand what’s going on.
Can you make a vector that repeats the letter “a” 50 times?
Can you make a vector that repeats the series of integers 1-10, 8 times?
#Your code here
You can also perform math operations on vectors. Try to predict the results before you run the code.
a <- 1
b <- 1:10
c <- a + b
c
x <- 1:10
y <- 10:1
z <- x + y
z
To access elements in a vector you use []
x_vec <- seq(0, 100, by = 2)
x_vec[1]
## [1] 0
x_vec[2]
## [1] 2
x_vec[10]
## [1] 18
x_vec[10:20]
## [1] 18 20 22 24 26 28 30 32 34 36 38
x_vec[seq(2,10,by = 2)]
## [1] 2 6 10 14 18
You can also multiply and divide vectors a single value or by a vector of the same length. Test these things out
# Your code here
When you want to combine character vectors we can do the following
fruits <- c("apple","grapes","bananas")
vegs <- c("lettuce","brocolli","spinach")
fruits_and_veg <- c(fruits, vegs)
fruits_and_veg
## [1] "apple" "grapes" "bananas" "lettuce" "brocolli" "spinach"
course_num <- c("210", "215" , "100")
course_dept <- c("GEO", "ENS", "ENS")
course_code <- paste(course_dept, course_num)
course_code
## [1] "GEO 210" "ENS 215" "ENS 100"
paste
functionFactors are special vectors that represent categorical data.
Unordered factor
responses <- factor(c("yes","no","no","yes","maybe","yes"))
responses
## [1] yes no no yes maybe yes
## Levels: maybe no yes
Ordered factor
grades <- factor(c("A","C","B","A","B","B","D","A"), levels = c("F","D","C","B","A"), ordered = TRUE)
grades
## [1] A C B A B B D A
## Levels: F < D < C < B < A
We are going to be using these all the time in this class and in data analysis in general. They are similar in structure to a spreadsheet that you might open in Excel.
Data frames are made up of rows and columns. Each column is a vector and all columns must be of the same length. Basically anything the you save in as a delimeted text or Excel file .csv
, .xls
, or .xlsx
can be read into R as a data frame.
Date frames have a number of important attributes that you’ll interact, in particular column names, row names, and dimensions.
We can load in data to a data frame or create one from scratch. We’ll create one below using the data.frame()
function
numbers <- c(1:26, NA)
lettersNew <- c(NA, letters) #letters is a special object available from base R
logical <- c(rep(TRUE, 13), NA, rep(FALSE, 13))
examp_df <- data.frame(lettersNew, numbers, logical, stringsAsFactors = FALSE)
To look at the first few rows and last few rows
head(examp_df) # first rows
## lettersNew numbers logical
## 1 <NA> 1 TRUE
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
tail(examp_df) # last rows
## lettersNew numbers logical
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
## 27 z NA FALSE
To access a variable (column) from a data frame you use the $
operator
examp_df$lettersNew # access the lettersNew variable
## [1] NA "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
## [20] "s" "t" "u" "v" "w" "x" "y" "z"
You can also access a data frame by specifying the rows and columns of interest. We use bracket notation []
to do this. You specify the row(s) and then the column(s) of interest within the bracket.
examp_df[2,3] # access the data in row 2 and column 3
## [1] TRUE
examp_df[2,] # to access all of the indices in a row or column, leave the index blank
## lettersNew numbers logical
## 2 a 2 TRUE
To access all of the indices in a row or column, leave the index blank
examp_df[2,] # access the data across all of the columns of row 2
## lettersNew numbers logical
## 2 a 2 TRUE
examp_df_subset
examp_df_subset
?To access row and/or column range you can use the :
operator in your indexing statement
examp_df[1:4,2:3] # access the data found in rows 1 through 4 and columns 2 through 3
## numbers logical
## 1 1 TRUE
## 2 2 TRUE
## 3 3 TRUE
## 4 4 TRUE
examp_df
examp_df
Below are some other useful functions for examining data frames
names(examp_df) # see column names
## [1] "lettersNew" "numbers" "logical"
rownames(examp_df) # see row names
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27"
str(examp_df) # show the data frame's structure
## 'data.frame': 27 obs. of 3 variables:
## $ lettersNew: chr NA "a" "b" "c" ...
## $ numbers : int 1 2 3 4 5 6 7 8 9 10 ...
## $ logical : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
dim(examp_df) # get the dimensions
## [1] 27 3
nrow(examp_df) # get the number of rows
## [1] 27
ncol(examp_df) # number of columns
## [1] 3
summary(examp_df) # summary info
## lettersNew numbers logical
## Length:27 Min. : 1.00 Mode :logical
## Class :character 1st Qu.: 7.25 FALSE:13
## Mode :character Median :13.50 TRUE :13
## Mean :13.50 NA's :1
## 3rd Qu.:19.75
## Max. :26.00
## NA's :1
na.omit(examp_df) # omit rows with NAs
## lettersNew numbers logical
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
## 7 f 7 TRUE
## 8 g 8 TRUE
## 9 h 9 TRUE
## 10 i 10 TRUE
## 11 j 11 TRUE
## 12 k 12 TRUE
## 13 l 13 TRUE
## 15 n 15 FALSE
## 16 o 16 FALSE
## 17 p 17 FALSE
## 18 q 18 FALSE
## 19 r 19 FALSE
## 20 s 20 FALSE
## 21 t 21 FALSE
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
Lists are actually a special type of vector
Lists are made with the list()
function
examp_list <- list(
letters=c("x","y","z"),
animals=c("cat","dog","bird","fish"),
numbers=1:100,
df=examp_df)
examp_list
## $letters
## [1] "x" "y" "z"
##
## $animals
## [1] "cat" "dog" "bird" "fish"
##
## $numbers
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
##
## $df
## lettersNew numbers logical
## 1 <NA> 1 TRUE
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
## 7 f 7 TRUE
## 8 g 8 TRUE
## 9 h 9 TRUE
## 10 i 10 TRUE
## 11 j 11 TRUE
## 12 k 12 TRUE
## 13 l 13 TRUE
## 14 m 14 NA
## 15 n 15 FALSE
## 16 o 16 FALSE
## 17 p 17 FALSE
## 18 q 18 FALSE
## 19 r 19 FALSE
## 20 s 20 FALSE
## 21 t 21 FALSE
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
## 27 z NA FALSE
Create a vector named vec_seq
that goes from 0 to 99 by 1. Print the vector results to console using the print()
function
Create another vector named vec_fracs
with the following sequence 0/1, 1/2, 2/3, 3/4, 4/5,…,99/100. Print the vector results to the console.
Access every other element of vec_fracs
starting with the 2nd element and print these subset to the console. Thus you would access element 2, 4, 6, 8,…,100.
Create a character vector that has five first names. Create another vector that has five last names. Then create a third vector that has the the first names listed in the first five elements and the last names listed in the last five elements.
Now create a vector that combines the first and last names, however each entry should be in the format Lastname, Firstname
. Hint: look at the help for the paste()
to see how you might do this.