The R Notebooks that we work in allow us to incorporate text, code, and output all in one place. This is a huge benefit when you want to create a report from your work in R. R Notebooks are excellent for producing computationally reproducible research.
An R Notebooks is technically an R Markdown file
(.Rmd)
. This means that your R Notebook is a blend of
R (which is the code portion of your Notebook) and
Markdown (the text portion). Markdown
is simply a system for formatting document features (e.g. text, margins,
bullets, table of contents,…).
There are ton’s of formatting options you can specify when working in an R Notebook and this allows us to create attractive and easy to read documents. We’ll learn a few basics today that will greatly improve how your R Notebooks look when you output your reports.
The R Markdown Cheatsheet that I handed out has examples similar to below, as well as more advanced topics that we won’t cover today.
All of your R Notebooks have a file header (also called a YAML Header). This is required for specifying how your file will look and what format it will be output to when you generate your reports.
The header is at the top of the Notebook and has three dashes
---
at the top and bottom.
Here’s the header that I used on this current Notebook. These settings specified how I want my file to look when it is rendered to a report.
---
title: "R Programming and Markdown Basics"
author: "ENS-215"
date: "10-Jan-2025"
output:
html_notebook:
theme: spacelab
toc: TRUE
---
The html_notebook
setting allows us to instantaneously
view any changes made by clicking Preview or simply
saving the Notebook. Note that changes to your code will appear when you
Preview a Notebook, however have to run the code before
previewing in order to see any changes to the output.
When you done and ready to generate your report you can Knit your document to an html file. There are two ways to do this:
To knit a file you can go to the menu bar at the top of your notebook and click the dropdown that currently says preview and select the knit to html option. This will knit your document, which runs all of your code and generates a nice report in html format (the file is saved in your current working directory).
An even easier way to knit your file is to go to the header at
the top of your document and change html_notebook
to
html_document
and then save your file. You will then see
that the Preview option in the menu bar will have changed to Knit. Click
Knit and your report will be knit.
As you already know, we can include R Code in our Notebooks. We can
add code blocks by hitting Ctrl + Alt + i
(PC) or
Cmd + Option + i
(Mac).
You can also generate a code block by typing ```{r}
on
one line, then hitting Enter and typing ```
on the line
below.
Give both of these approaches a try.
Example code block
x <- 10 # comments in a code block are created by putting the hashtag symbol before the comment
x + 5
## [1] 15
Remember you can run a code block by hitting
Ctrl + Shift + Enter
(PC) or
Cmd + Shift + Enter
(Mac). To see the other
Run options you can click the Run
dropdown button in the top right of your Editor window.
Since your R Notebook (R Markdown) file is essentially a plain text file (e.g. You can’t modify how the text looks in your editor like you can in Microsoft Word) you need to use special characters to specify how your text should be formatted in your output report.
Section headers/titles like the ones you see
separating the sections of this document are created by putting an
#
at the start of a line of text. To create smaller section
headers add more hashtags to the start of the line. For instance
##
will create a smaller section header and
###
would create and even smaller one.
Bold font is created by putting **
at the start and end of the section of text you want in bold. For
instance you would type ** text I want in bold **
.
Italics are created with either
_text inside is in italics_
or
*text inside is in italics*
To make code show up verbatim you use put the `
symbol around the text your want to appear as verbatim. Note that the
symbol is NOT the single quote but is the symbol that appears to the
left of the 1 on your keyboard.
Superscripts such as X2 are done with
the ^superscripted text here^
. So X2, is created
by X^2^
Subscripts such as Xi are done with
~subscripted text here~
. So Xi, is created by
X~i~
I can also create bulleted lists using the
+
at the start of a line.
I can create numbered lists by typing something like this
1. First item
2. Second item
i) sub-item
ii) another sub-item
And the list would look like this in my report.
2 + 1 #Add
## [1] 3
15 - 4 #Subtract
## [1] 11
9 * 2 #Multiply
## [1] 18
3 ^ 4 #Exponents
## [1] 81
120 / 8 #Divide
## [1] 15
5 %% 2 #Modulus
## [1] 1
4 > 2 #Greater than
## [1] TRUE
2 < 5 #Less than
## [1] TRUE
5 <= 5 #Less than or equal
## [1] TRUE
8 >= 2 #Greater than or equal
## [1] TRUE
2 == 2 #Equality: notice that it is TWO equal signs!
## [1] TRUE
5 != 7 #Not Equals
## [1] TRUE
Note that when you run a code block it is sending the code the the console. You can also type code directly into the console and it will be evaluated. This can be handy for a quick one off calculation, however for running many operations we’ll stick to using an R notebook.
Typically we’ll be re-using the results from some calculation so
we’ll want to assign it to a variable. In R we use <-
to
assign values to objects So x <- 10
would mean that the
object x is assigned a value of 10.
x <- 10 # assign a variable
# to print out the value of x to the console I can simply type out the variable on its own line of code
x
## [1] 10
y <- (2*x) + 5 # you can use mathematical operations and previously declared variables when assigning a new variable
y
## [1] 25
z <- x + y + 0.1234
z
## [1] 35.1234
Variables can take non-numeric values. The objects below take strings (i.e. text) as their values.
studentName_1 <- "Bob"
studentName_1
## [1] "Bob"
studentName_2 <- "Jess"
studentName_2
## [1] "Jess"
Notice how I gave the objects descriptive names. Also notice how I used a consistent naming format. You should be put thought into how you name objects This will make your code much easier to read and much faster to write.
Object names cannot begin with a number, contain spaces, or (most) special characters. You may use underscores and periods in object names. Also note that objects are case sensitive.
So if you have an object a
then typing out
A
would NOT be referring to the object
that your names a
.
Now take a look at your Environment tab. You’ll see
all of the objects that we’ve assigned thus far. If you want to see all
of the objects in your environment you use the ls()
function.
ls() # this prints out the names of all of the objects currently in my environment
## [1] "studentName_1" "studentName_2" "x" "y"
## [5] "z"
To remove an object from your workspace you can use the
rm()
function
rm(x)
number_1
and
number_2
and give them the values of 2.5 and 10,
respectivelystring_1
and
string_2
, give them any character string that you would
like.number_1
, number_2
, and the
power of math create an object called number_3
that equals
25string_2
string_1
and number_1
. What
happens?Everything in R is an object. The data assigned to a given object can be categorized by its data type. Data can be organized into different structures and these structures can often accomodate a mix of different data types.
Any value stored in a data object can be characterized by its data type.
The basic data types in R are:
Example | Type |
---|---|
“a” “swc” | character |
2, 15.5 | numeric |
2L | integer |
TRUE, FALSE | logical |
1+4i | complex |
62 6f 62 | raw |
We will almost always be dealing with character, numeric, and logical data types in this class.
In many cases the data you deal with may have missing values or other
issues. Values such as missing data NA
, not a number
NaN
and infinity inf
will come up from time to
time. We’ll learn techniques for dealing with these throughout the
term.
Infinity can arise as such
1/0
## [1] Inf
Not a number can arise as follows
0/0
## [1] NaN
If you get stuck remember along with me and you classmates, Google can almost always point you in the right direction. Your textbook is also a great resource.
In addition to these resource R has built-in help files. Let’s practice with these.
To get help you type ?term_of_interest
in your console
or in your Notebook (and then run the code block) and help will appear
in the Help window to the right. For example
In your console get help for the na.omit()
function.
Take a minute to look at the help file and understand what it is
showing. All help files are similarly formatted.
Try getting help for another function that you are interested in.
Data can be stored in R as a number of different data structures. The structure that you chose to assign data to will depend on the features/characteristics of your data.
The data structures available in base R include:
To create a vector we use the c()
function
majors_vec <- c("Environmental Science","Geoscience","Chemistry")
num_vec <- c(1, 10, 5034.253, -1.045)
log_vec <- c(TRUE, FALSE, FALSE, TRUE, TRUE)
You can simply type the variable or use the print()
function to print out the vector’s contents
majors_vec
## [1] "Environmental Science" "Geoscience" "Chemistry"
print(log_vec)
## [1] TRUE FALSE FALSE TRUE TRUE
We can determine the properties of a vector using some helpful functions
length(log_vec) # vector length
## [1] 5
class(num_vec) # class
## [1] "numeric"
str(log_vec) # structure of the vector
## logi [1:5] TRUE FALSE FALSE TRUE TRUE
Missing data (represented by NA) are often encountered. Below are a few methods for dealing with them.
a_missing <- c(1,2,3,4,NA,5,6,NA,7,8,9) # create a vector that has some missing data
na.omit(a_missing) #na.omit - removes them
## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "omit"
a_missing
by using the
sum()
function. Do you see any issues?In some cases we will want to remove missing data entries so that we
can just examine the entries where we have values. Let’s remove the NAs
from a_missing
and assign the new data to a new object
called a_cleaned
a_cleaned <- na.omit(a_missing)
a_cleaned
. Does it look like everything
worked?a_cleaned
.You can also use na.exclude()
to remove missing
values
na.exclude(a_missing) #similar to omit, but has different behavior with some functions.
## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "exclude"
is.na()
will tell you which values in the object are
NAs
is.na(a_missing) #Will tell you if a value is NA
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
We commonly need to create vectors of a sequence of numbers or repeated numbers. There are functions to speed this up.
Create a series
series_1 <- 1:10
series_2 <- seq(10)
series_3 <- seq(0, 10, by = 0.05)
Repeat values
n_reps <- 5
rep_val <- 10
many_tens <- rep(rep_val,n_reps)
print(many_tens)
## [1] 10 10 10 10 10
Look at the above code and understand what’s going on.
Can you make a vector that repeats the letter “a” 50 times?
Can you make a vector that repeats the series of integers 1-10, 8 times?
#Your code here
You can also perform math operations on vectors. Try to predict the results before you run the code.
a <- 1
b <- 1:10
c <- a + b
c
x <- 1:10
y <- 10:1
z <- x + y
z
To access elements in a vector you use
[]
x_vec <- seq(0, 100, by = 2)
x_vec[1]
## [1] 0
x_vec[2]
## [1] 2
x_vec[10]
## [1] 18
x_vec[10:20]
## [1] 18 20 22 24 26 28 30 32 34 36 38
x_vec[seq(2,10,by = 2)]
## [1] 2 6 10 14 18
You can also multiply and divide vectors a single value or by a vector of the same length. Test these things out
# Your code here
When you want to combine character vectors we can do the following
fruits <- c("apple","grapes","bananas")
vegs <- c("lettuce","brocolli","spinach")
fruits_and_veg <- c(fruits, vegs)
fruits_and_veg
## [1] "apple" "grapes" "bananas" "lettuce" "brocolli" "spinach"
course_num <- c("210", "215" , "100")
course_dept <- c("GEO", "ENS", "ENS")
course_code <- paste(course_dept, course_num)
course_code
## [1] "GEO 210" "ENS 215" "ENS 100"
paste
functionFactors are special vectors that represent categorical data.
Unordered factor
responses <- factor(c("yes","no","no","yes","maybe","yes"))
responses
## [1] yes no no yes maybe yes
## Levels: maybe no yes
Ordered factor
grades <- factor(c("A","C","B","A","B","B","D","A"), levels = c("F","D","C","B","A"), ordered = TRUE)
grades
## [1] A C B A B B D A
## Levels: F < D < C < B < A
We are going to be using these all the time in this class and in data analysis in general. They are similar in structure to a spreadsheet that you might open in Excel.
Data frames are made up of rows and columns. Each column is a vector
and all columns must be of the same length. Basically anything the you
save in as a delimeted text or Excel file .csv
,
.xls
, or .xlsx
can be read into R as a data
frame.
Date frames have a number of important attributes that you’ll interact, in particular column names, row names, and dimensions.
We can load in data to a data frame or create one from scratch. We’ll
create one below using the data.frame()
function
numbers <- c(1:26, NA)
lettersNew <- c(NA, letters) #letters is a special object available from base R
logical <- c(rep(TRUE, 13), NA, rep(FALSE, 13))
examp_df <- data.frame(lettersNew, numbers, logical, stringsAsFactors = FALSE)
To look at the first few rows and last few rows
head(examp_df) # first rows
## lettersNew numbers logical
## 1 <NA> 1 TRUE
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
tail(examp_df) # last rows
## lettersNew numbers logical
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
## 27 z NA FALSE
To access a variable (column) from a data frame you use the
$
operator
examp_df$lettersNew # access the lettersNew variable
## [1] NA "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
## [20] "s" "t" "u" "v" "w" "x" "y" "z"
You can also access a data frame by specifying the rows and columns
of interest. We use bracket notation []
to do this. You
specify the row(s) and then the column(s) of interest within the
bracket.
examp_df[2,3] # access the data in row 2 and column 3
## [1] TRUE
examp_df[2,] # to access all of the indices in a row or column, leave the index blank
## lettersNew numbers logical
## 2 a 2 TRUE
To access all of the indices in a row or column, leave the index blank
examp_df[2,] # access the data across all of the columns of row 2
## lettersNew numbers logical
## 2 a 2 TRUE
examp_df_subset
examp_df_subset
?To access row and/or column range you can use the :
operator in your indexing statement
examp_df[1:4,2:3] # access the data found in rows 1 through 4 and columns 2 through 3
## numbers logical
## 1 1 TRUE
## 2 2 TRUE
## 3 3 TRUE
## 4 4 TRUE
examp_df
examp_df
Below are some other useful functions for examining data frames
names(examp_df) # see column names
## [1] "lettersNew" "numbers" "logical"
rownames(examp_df) # see row names
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27"
str(examp_df) # show the data frame's structure
## 'data.frame': 27 obs. of 3 variables:
## $ lettersNew: chr NA "a" "b" "c" ...
## $ numbers : int 1 2 3 4 5 6 7 8 9 10 ...
## $ logical : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
dim(examp_df) # get the dimensions
## [1] 27 3
nrow(examp_df) # get the number of rows
## [1] 27
ncol(examp_df) # number of columns
## [1] 3
summary(examp_df) # summary info
## lettersNew numbers logical
## Length:27 Min. : 1.00 Mode :logical
## Class :character 1st Qu.: 7.25 FALSE:13
## Mode :character Median :13.50 TRUE :13
## Mean :13.50 NA's :1
## 3rd Qu.:19.75
## Max. :26.00
## NA's :1
na.omit(examp_df) # omit rows with NAs
## lettersNew numbers logical
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
## 7 f 7 TRUE
## 8 g 8 TRUE
## 9 h 9 TRUE
## 10 i 10 TRUE
## 11 j 11 TRUE
## 12 k 12 TRUE
## 13 l 13 TRUE
## 15 n 15 FALSE
## 16 o 16 FALSE
## 17 p 17 FALSE
## 18 q 18 FALSE
## 19 r 19 FALSE
## 20 s 20 FALSE
## 21 t 21 FALSE
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
Lists are actually a special type of vector
Lists are made with the list()
function
examp_list <- list(letters = c("x","y","z"),
animals = c("cat","dog","bird","fish"),
numbers = 1:100,
df = examp_df)
examp_list
## $letters
## [1] "x" "y" "z"
##
## $animals
## [1] "cat" "dog" "bird" "fish"
##
## $numbers
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
##
## $df
## lettersNew numbers logical
## 1 <NA> 1 TRUE
## 2 a 2 TRUE
## 3 b 3 TRUE
## 4 c 4 TRUE
## 5 d 5 TRUE
## 6 e 6 TRUE
## 7 f 7 TRUE
## 8 g 8 TRUE
## 9 h 9 TRUE
## 10 i 10 TRUE
## 11 j 11 TRUE
## 12 k 12 TRUE
## 13 l 13 TRUE
## 14 m 14 NA
## 15 n 15 FALSE
## 16 o 16 FALSE
## 17 p 17 FALSE
## 18 q 18 FALSE
## 19 r 19 FALSE
## 20 s 20 FALSE
## 21 t 21 FALSE
## 22 u 22 FALSE
## 23 v 23 FALSE
## 24 w 24 FALSE
## 25 x 25 FALSE
## 26 y 26 FALSE
## 27 z NA FALSE
Create a vector named vec_seq
that goes from 0 to 99
by 1. Print the vector results to console using the print()
function
Create another vector named vec_fracs
with the
following sequence 0/1, 1/2, 2/3, 3/4, 4/5,…,99/100. Print the vector
results to the console.
Access every other element of vec_fracs
starting
with the 2nd element and print these subset to the console. Thus you
would access element 2, 4, 6, 8,…,100.
Create a character vector that has five first names. Create another vector that has five last names. Then create a third vector that has the the first names listed in the first five elements and the last names listed in the last five elements.
Now create a vector that combines the first and last names,
however each entry should be in the format
Lastname, Firstname
. Hint: look at the help for the
paste()
to see how you might do this.