R Programming and Markdown Basics

What is R Markdown?

The R Notebooks that we work in allow us to incorporate text, code, and output all in one place. This is a huge benefit when you want to create a report from your work in R. R Notebooks are excellent for producing computationally reproducible research.

An R Notebooks is technically an R Markdown file (.Rmd). This means that your R Notebook is a blend of R (which is the code portion of your Notebook) and Markdown (the text portion). Markdown is simply a system for formatting document features (e.g. text, margins, bullets, table of contents,…).

There are ton’s of formatting options you can specify when working in an R Notebook and this allows us to create attractive and easy to read documents. We’ll learn a few basics today that will greatly improve how your R Notebooks look when you output your reports.

Basic Elements of R Markdown (Notebook) files

The R Markdown Cheatsheet that I handed out has examples similar to below, as well as more advanced topics that we won’t cover today.

File Header

All of your R Notebooks have a file header (also called a YAML Header). This is required for specifying how your file will look and what format it will be output to when you generate your reports.

The header is at the top of the Notebook and has three dashes --- at the top and bottom.

Here’s the header that I used on this current Notebook. These settings specified how I want my file to look when it is rendered to a report.

---
title: "R Programming and Markdown Basics"
author: "ENS-215"
date: "10-Jan-2025"
output:
  html_notebook:
    theme: spacelab
    toc: TRUE
---

The html_notebook setting allows us to instantaneously view any changes made by clicking Preview or simply saving the Notebook. Note that changes to your code will appear when you Preview a Notebook, however have to run the code before previewing in order to see any changes to the output.

When you done and ready to generate your report you can Knit your document to an html file. There are two ways to do this:

To knit a file you can go to the menu bar at the top of your notebook and click the dropdown that currently says preview and select the knit to html option. This will knit your document, which runs all of your code and generates a nice report in html format (the file is saved in your current working directory).
An even easier way to knit your file is to go to the header at the top of your document and change html_notebook to html_document and then save your file. You will then see that the Preview option in the menu bar will have changed to Knit. Click Knit and your report will be knit.

R Code

As you already know, we can include R Code in our Notebooks. We can add code blocks by hitting Ctrl + Alt + i (PC) or Cmd + Option + i (Mac).

You can also generate a code block by typing ```{r} on one line, then hitting Enter and typing ``` on the line below.

Give both of these approaches a try.

Example code block

x <- 10 # comments in a code block are created by putting the hashtag symbol before the comment
x + 5

## [1] 15

Remember you can run a code block by hitting Ctrl + Shift + Enter (PC) or Cmd + Shift + Enter (Mac). To see the other Run options you can click the Run dropdown button in the top right of your Editor window.

Markdown Syntax

Since your R Notebook (R Markdown) file is essentially a plain text file (e.g. You can’t modify how the text looks in your editor like you can in Microsoft Word) you need to use special characters to specify how your text should be formatted in your output report.

Section headers/titles like the ones you see separating the sections of this document are created by putting an # at the start of a line of text. To create smaller section headers add more hashtags to the start of the line. For instance ## will create a smaller section header and ### would create and even smaller one.
Bold font is created by putting ** at the start and end of the section of text you want in bold. For instance you would type ** text I want in bold **.
Italics are created with either _text inside is in italics_ or *text inside is in italics*
To make code show up verbatim you use put the ` symbol around the text your want to appear as verbatim. Note that the symbol is NOT the single quote but is the symbol that appears to the left of the 1 on your keyboard.
Superscripts such as X² are done with the ^superscripted text here^. So X², is created by X^2^
Subscripts such as X_i are done with ~subscripted text here~. So X_i, is created by X~i~
I can also create bulleted lists using the + at the start of a line.
I can create numbered lists by typing something like this

1. First item
2. Second item
    i) sub-item 
    ii) another sub-item

And the list would look like this in my report.

First item
Second item
1. sub-item
2. another sub-item

Line breaks to make a line break show up in your formatted document, you need to put TWO SPACES at the end of the line before and then hit ENTER. The line break will only show up in your knit/previewed document if you have two spaces

Exercise

Spend some time testing out the different Markdown formatting options you learned above

R Programming basics

Basic operations and calculations

As you’ve already seen by now you can use R as a calculator. Below is a list of some basic operations.

2 + 1 #Add

## [1] 3

15 - 4 #Subtract

## [1] 11

9 * 2 #Multiply

## [1] 18

3 ^ 4 #Exponents

## [1] 81

120 / 8 #Divide

## [1] 15

5 %% 2 #Modulus

## [1] 1

4 > 2 #Greater than

## [1] TRUE

2 < 5 #Less than

## [1] TRUE

5 <= 5 #Less than or equal

## [1] TRUE

8 >= 2 #Greater than or equal

## [1] TRUE

2 == 2 #Equality: notice that it is TWO equal signs!

## [1] TRUE

5 != 7 #Not Equals

## [1] TRUE

Note that when you run a code block it is sending the code the the console. You can also type code directly into the console and it will be evaluated. This can be handy for a quick one off calculation, however for running many operations we’ll stick to using an R notebook.

Assiging values to a variable

Typically we’ll be re-using the results from some calculation so we’ll want to assign it to a variable. In R we use <- to assign values to objects So x <- 10 would mean that the object x is assigned a value of 10.

x <- 10 # assign a variable

# to print out the value of x to the console I can simply type out the variable on its own line of code
x

## [1] 10

y <- (2*x) + 5 # you can use mathematical operations and previously declared variables when assigning a new variable

y

## [1] 25

z <- x + y + 0.1234
z

## [1] 35.1234

Variables can take non-numeric values. The objects below take strings (i.e. text) as their values.

studentName_1 <- "Bob"
studentName_1

## [1] "Bob"

studentName_2 <- "Jess"
studentName_2

## [1] "Jess"

Notice how I gave the objects descriptive names. Also notice how I used a consistent naming format. You should be put thought into how you name objects This will make your code much easier to read and much faster to write.

Object names cannot begin with a number, contain spaces, or (most) special characters. You may use underscores and periods in object names. Also note that objects are case sensitive.

So if you have an object a then typing out A would NOT be referring to the object that your names a.

Examining your Environment

Now take a look at your Environment tab. You’ll see all of the objects that we’ve assigned thus far. If you want to see all of the objects in your environment you use the ls() function.

ls() # this prints out the names of all of the objects currently in my environment

## [1] "studentName_1" "studentName_2" "x"             "y"            
## [5] "z"

To remove an object from your workspace you can use the rm() function

rm(x)

Refresher Exercise 1:

Create two objects named number_1 and number_2 and give them the values of 2.5 and 10, respectively
Create two more objects named string_1 and string_2, give them any character string that you would like.
Now using number_1, number_2, and the power of math create an object called number_3 that equals 25
Create two more objects whose value is of your choosing
List the objects in your workspace
Remove string_2
Try to add string_1 and number_1. What happens?

Data types and data structures

Everything in R is an object. The data assigned to a given object can be categorized by its data type. Data can be organized into different structures and these structures can often accomodate a mix of different data types.

Data types

Any value stored in a data object can be characterized by its data type.

The basic data types in R are:

Example	Type
“a” “swc”	character
2, 15.5	numeric
2L	integer
TRUE, FALSE	logical
1+4i	complex
62 6f 62	raw

We will almost always be dealing with character, numeric, and logical data types in this class.

In many cases the data you deal with may have missing values or other issues. Values such as missing data NA, not a number NaN and infinity inf will come up from time to time. We’ll learn techniques for dealing with these throughout the term.

Infinity can arise as such

1/0

## [1] Inf

Not a number can arise as follows

0/0

## [1] NaN

Getting help (reminder)

If you get stuck remember along with me and you classmates, Google can almost always point you in the right direction. Your textbook is also a great resource.

In addition to these resource R has built-in help files. Let’s practice with these.

To get help you type ?term_of_interest in your console or in your Notebook (and then run the code block) and help will appear in the Help window to the right. For example

In your console get help for the na.omit() function. Take a minute to look at the help file and understand what it is showing. All help files are similarly formatted.

Try getting help for another function that you are interested in.

Data Structures

Data can be stored in R as a number of different data structures. The structure that you chose to assign data to will depend on the features/characteristics of your data.

The data structures available in base R include:

vector
list
matrix
data frame
factors
tables

Vectors

Common and basic data structure in R.
Can be a vector of characters, logical, integer, or numerica data - However a given vector can only contain one data type.

To create a vector we use the c() function

majors_vec <- c("Environmental Science","Geoscience","Chemistry")
num_vec <- c(1, 10, 5034.253, -1.045)
log_vec <- c(TRUE, FALSE, FALSE, TRUE, TRUE)

You can simply type the variable or use the print() function to print out the vector’s contents

majors_vec

## [1] "Environmental Science" "Geoscience"            "Chemistry"

print(log_vec)

## [1]  TRUE FALSE FALSE  TRUE  TRUE

We can determine the properties of a vector using some helpful functions

length(log_vec) # vector length

## [1] 5

class(num_vec) # class

## [1] "numeric"

str(log_vec) # structure of the vector

##  logi [1:5] TRUE FALSE FALSE TRUE TRUE

Missing data (represented by NA) are often encountered. Below are a few methods for dealing with them.

a_missing <- c(1,2,3,4,NA,5,6,NA,7,8,9) # create a vector that has some missing data

na.omit(a_missing) #na.omit - removes them

## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "omit"

Try taking the sum of the a_missing by using the sum() function. Do you see any issues?

In some cases we will want to remove missing data entries so that we can just examine the entries where we have values. Let’s remove the NAs from a_missing and assign the new data to a new object called a_cleaned

a_cleaned <- na.omit(a_missing)

Look at a_cleaned. Does it look like everything worked?
Now try taking the sum of a_cleaned.

You can also use na.exclude() to remove missing values

na.exclude(a_missing) #similar to omit, but has different behavior with some functions.

## [1] 1 2 3 4 5 6 7 8 9
## attr(,"na.action")
## [1] 5 8
## attr(,"class")
## [1] "exclude"

is.na() will tell you which values in the object are NAs

is.na(a_missing) #Will tell you if a value is NA

##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE

We commonly need to create vectors of a sequence of numbers or repeated numbers. There are functions to speed this up.

Create a series

series_1 <- 1:10 
series_2 <- seq(10)
series_3 <- seq(0, 10, by = 0.05)

Repeat values

n_reps <- 5
rep_val <- 10
many_tens <- rep(rep_val,n_reps)

print(many_tens)

## [1] 10 10 10 10 10

Look at the above code and understand what’s going on.

Can you make a vector that repeats the letter “a” 50 times?
Can you make a vector that repeats the series of integers 1-10, 8 times?

#Your code here

You can also perform math operations on vectors. Try to predict the results before you run the code.

a <- 1
b <- 1:10
c <- a + b

c

x <- 1:10
y <- 10:1
z <- x + y

z

Were you able to predict the results?

To access elements in a vector you use []

x_vec <- seq(0, 100, by = 2)

x_vec[1]

## [1] 0

x_vec[2]

## [1] 2

x_vec[10]

## [1] 18

x_vec[10:20]

##  [1] 18 20 22 24 26 28 30 32 34 36 38

x_vec[seq(2,10,by = 2)]

## [1]  2  6 10 14 18

Make sure you understand what each line of code above is doing

You can also multiply and divide vectors a single value or by a vector of the same length. Test these things out

# Your code here

Make sure you understand what is going on with the examples you tested.

When you want to combine character vectors we can do the following

fruits <- c("apple","grapes","bananas")
vegs <- c("lettuce","brocolli","spinach")
fruits_and_veg <- c(fruits, vegs)

fruits_and_veg

## [1] "apple"    "grapes"   "bananas"  "lettuce"  "brocolli" "spinach"

course_num <- c("210", "215" , "100")
course_dept <- c("GEO", "ENS", "ENS")
course_code <- paste(course_dept, course_num)

course_code

## [1] "GEO 210" "ENS 215" "ENS 100"

What happened above? How did the results differ and when might you use these two differing methods?
Imagine you want a dash instead of a space between the department and the course number? Figure out how to do this using the paste function

Factors

Factors are special vectors that represent categorical data.

Can be ordered (e.g. low, medium, high) or unordered (e.g. male, female)
Useful for assigning groups or categories to data

Unordered factor

responses <- factor(c("yes","no","no","yes","maybe","yes"))
responses

## [1] yes   no    no    yes   maybe yes  
## Levels: maybe no yes

Ordered factor

grades <- factor(c("A","C","B","A","B","B","D","A"), levels = c("F","D","C","B","A"), ordered = TRUE)
grades

## [1] A C B A B B D A
## Levels: F < D < C < B < A

Think of some more examples where you might use factors. Can you think of both ordered and unordered examples?
Did you encounter any variables in your first lab that could be treated as a factor?

Data frames

We are going to be using these all the time in this class and in data analysis in general. They are similar in structure to a spreadsheet that you might open in Excel.

Data frames are made up of rows and columns. Each column is a vector and all columns must be of the same length. Basically anything the you save in as a delimeted text or Excel file .csv, .xls, or .xlsx can be read into R as a data frame.

Date frames have a number of important attributes that you’ll interact, in particular column names, row names, and dimensions.

We can load in data to a data frame or create one from scratch. We’ll create one below using the data.frame() function

numbers <- c(1:26, NA)
lettersNew <- c(NA, letters) #letters is a special object available from base R
logical <- c(rep(TRUE, 13), NA, rep(FALSE, 13))
examp_df <- data.frame(lettersNew, numbers, logical, stringsAsFactors = FALSE)

To look at the first few rows and last few rows

head(examp_df) # first rows

##   lettersNew numbers logical
## 1       <NA>       1    TRUE
## 2          a       2    TRUE
## 3          b       3    TRUE
## 4          c       4    TRUE
## 5          d       5    TRUE
## 6          e       6    TRUE

tail(examp_df) # last rows

##    lettersNew numbers logical
## 22          u      22   FALSE
## 23          v      23   FALSE
## 24          w      24   FALSE
## 25          x      25   FALSE
## 26          y      26   FALSE
## 27          z      NA   FALSE

To access a variable (column) from a data frame you use the $ operator

examp_df$lettersNew  # access the lettersNew variable

##  [1] NA  "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
## [20] "s" "t" "u" "v" "w" "x" "y" "z"

Try accessing some other variables from this data frame

You can also access a data frame by specifying the rows and columns of interest. We use bracket notation [] to do this. You specify the row(s) and then the column(s) of interest within the bracket.

examp_df[2,3] # access the data in row 2 and column 3

## [1] TRUE

examp_df[2,] # to access all of the indices in a row or column, leave the index blank

##   lettersNew numbers logical
## 2          a       2    TRUE

To access all of the indices in a row or column, leave the index blank

examp_df[2,] # access the data across all of the columns of row 2

##   lettersNew numbers logical
## 2          a       2    TRUE

Can you access all of the rows of column 3?
Once you’ve done that, assign this subset of the data to a new object called examp_df_subset
What data type is examp_df_subset?

To access row and/or column range you can use the : operator in your indexing statement

examp_df[1:4,2:3] # access the data found in rows 1 through 4 and columns 2 through 3

##   numbers logical
## 1       1    TRUE
## 2       2    TRUE
## 3       3    TRUE
## 4       4    TRUE

Access the data rows 10:20 and all of the columns in examp_df
Access only the even rows in columns 1 and 2 of examp_df

Below are some other useful functions for examining data frames

names(examp_df) # see column names

## [1] "lettersNew" "numbers"    "logical"

rownames(examp_df) # see row names

##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27"

str(examp_df) # show the data frame's structure

## 'data.frame':    27 obs. of  3 variables:
##  $ lettersNew: chr  NA "a" "b" "c" ...
##  $ numbers   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ logical   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

dim(examp_df) # get the dimensions

## [1] 27  3

nrow(examp_df) # get the number of rows

## [1] 27

ncol(examp_df) # number of columns

## [1] 3

summary(examp_df) # summary info

##   lettersNew           numbers       logical       
##  Length:27          Min.   : 1.00   Mode :logical  
##  Class :character   1st Qu.: 7.25   FALSE:13       
##  Mode  :character   Median :13.50   TRUE :13       
##                     Mean   :13.50   NA's :1        
##                     3rd Qu.:19.75                  
##                     Max.   :26.00                  
##                     NA's   :1

na.omit(examp_df) # omit rows with NAs

##    lettersNew numbers logical
## 2           a       2    TRUE
## 3           b       3    TRUE
## 4           c       4    TRUE
## 5           d       5    TRUE
## 6           e       6    TRUE
## 7           f       7    TRUE
## 8           g       8    TRUE
## 9           h       9    TRUE
## 10          i      10    TRUE
## 11          j      11    TRUE
## 12          k      12    TRUE
## 13          l      13    TRUE
## 15          n      15   FALSE
## 16          o      16   FALSE
## 17          p      17   FALSE
## 18          q      18   FALSE
## 19          r      19   FALSE
## 20          s      20   FALSE
## 21          t      21   FALSE
## 22          u      22   FALSE
## 23          v      23   FALSE
## 24          w      24   FALSE
## 25          x      25   FALSE
## 26          y      26   FALSE

Lists

Lists are actually a special type of vector

Lists can contain multiple items, of multiple types, and of multiple structures.
List are versatile and often used inside functions or as an output of functions.

Lists are made with the list() function

examp_list <- list(letters = c("x","y","z"),  
                   animals = c("cat","dog","bird","fish"),
                   numbers = 1:100,
                   df = examp_df)

examp_list

## $letters
## [1] "x" "y" "z"
## 
## $animals
## [1] "cat"  "dog"  "bird" "fish"
## 
## $numbers
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100
## 
## $df
##    lettersNew numbers logical
## 1        <NA>       1    TRUE
## 2           a       2    TRUE
## 3           b       3    TRUE
## 4           c       4    TRUE
## 5           d       5    TRUE
## 6           e       6    TRUE
## 7           f       7    TRUE
## 8           g       8    TRUE
## 9           h       9    TRUE
## 10          i      10    TRUE
## 11          j      11    TRUE
## 12          k      12    TRUE
## 13          l      13    TRUE
## 14          m      14      NA
## 15          n      15   FALSE
## 16          o      16   FALSE
## 17          p      17   FALSE
## 18          q      18   FALSE
## 19          r      19   FALSE
## 20          s      20   FALSE
## 21          t      21   FALSE
## 22          u      22   FALSE
## 23          v      23   FALSE
## 24          w      24   FALSE
## 25          x      25   FALSE
## 26          y      26   FALSE
## 27          z      NA   FALSE

Exercises

Create a vector named vec_seq that goes from 0 to 99 by 1. Print the vector results to console using the print() function
Create another vector named vec_fracs with the following sequence 0/1, 1/2, 2/3, 3/4, 4/5,…,99/100. Print the vector results to the console.
Access every other element of vec_fracs starting with the 2nd element and print these subset to the console. Thus you would access element 2, 4, 6, 8,…,100.
Create a character vector that has five first names. Create another vector that has five last names. Then create a third vector that has the the first names listed in the first five elements and the last names listed in the last five elements.
Now create a vector that combines the first and last names, however each entry should be in the format Lastname, Firstname. Hint: look at the help for the paste() to see how you might do this.