Precipitation is the source of renewable water resources to Earth’s environments. Precipitation ultimately sustains groundwater and surface water supplies and is central to the economic and environmental well-being of humanity. Throughout history, humankind has had a deep interest and need to understand and monitor precipitation and in the modern era precipation records have been collected across the globe. The distribution of precipitation both in space (i.e. between locations) and in time (both seasonal and inter-annual) has shaped the Earth’s surface and influences many aspects of our day to day existence. For instance the available water resources in a given region are in large part controlled by the precipitation that falls in that area, thus where humans have decided to settle and establish agricultural and population centers is influenced by precipitation patterns. Understanding spatial and temporal patterns in precipitation is thus, of critical societal importance and as such governmental and non-governmental agencies around world continually collect and analyze this data.
The National Oceanic and Atmospheric Agency (NOAA) is one of the key agencies responsible for the collection and analysis of precipitation data (and other climatic data) in the US. In today’s lab you will perform some exploratory data analysis with NOAA time-series precipitation data for US states.
Before starting the analysis, you need to read the following sources. These readings will give you helpful background info and context and will help guide your analysis. In addition to these sources, you may find it helpful to search out other sources to help you make sense of your analysis.
Today we are going to apply the fundamental R programming skills that we have learned thus far to analyze precipitation data from NOAA. The data we will use in today’s lab is described below:
Data Source: NOAA Climate at a Glance: Precipitation Time Series
Dataset has total precipitation, reported monthly from 1895-2017, on a state-by-state basis for the US. The dataset has four columns
Interesting note: Due to the government shutdown the NOAA website that hosts the data is offline. Luckily I had downloaded it before the shutdown, otherwise I would have had to cancel lab and give everyone a 100%.
library(readr)
library(tidyverse)
precip_data <- read_csv("https://stahlm.github.io/ENS_215/Data/NOAA_State_Precip_LabData.csv")
Take a look at the data and make sure you understand the dataset. You may want to run the summary()
function to see if the data looks reasonable (e.g. no unreasonable values such as negative numbers for precipitation).
Choose a state you are interested in studying. Select the data for just that state and save it to a new data frame. You can use the logical operators we learned this week to select only the rows that have the state of interest.
# Your code here
Take a look and see how precipitation varies between the months. Is there a clear seasonal signal (e.g. is one part of the year relatively wet)?
To get started you can make a few figures. Try a simple scatter plot of precipitation vs month. The basic code structure is below
ggplot(your data frame here) + geom_jitter(aes(x = factor(...) , y = ... ))
Note that in the code above we put our x data inside the factor()
function. This will make the data plot nicer, since it will treat each month as a discrete entity/group. The geom_jitter
function is generates a scatter plot, but adds a little “noise” to the data so that we can see points that would otherwise overlap. If we didn’t jitter the points then they would pile on top of on another along the x-axis (e.g. the January points would all have an x-value = 1, …)
A box and whisker plot is also helpful and will help to more cleanly summarize the data in each month. We’ll learn more about these types of figures in the upcoming weeks, so if you are not very familiar with them don’t worry (if you have never seen one before ask a classmate or me to explain them). The basic code structure is below.
ggplot(your data frame here) + geom_boxplot(aes(x = factor(...) , y = ... ))
Do you observe any seasonality in your data? Think about the variability or lackthereof and what that might imply from an environmental and societal perspective (e.g. flooding, suitability for agriculture, water availability).
Do any months look like they have notably extreme precipitation values?
Do you observe anything else interesting or notable?
Now let’s create a table that summarizes some monthly precipitation statistics. Our table will have twelve rows (one for each month) and four columns. The columns we want to create are described below:
You can use your looping skills to loop over the dataset and compute the statistics for each month. First let’s create a data frame that we’ll use to store the statistics we compute. Below is the code to initialize your data frame. We’ll fill the Month
column and we’ll initialize all other columns to -9999
which will act as a placeholder until we’ve calculated the relevant statistics.
precip_state_monthly_summary <- data.frame(Month = 1:12, Precip_avg = -9999, Precip_min = -9999, Precip_max = -9999)
Take a look at your new data frame and make sure it looks ok.
Now you should create your for loop to loop through your data frame that has all of the precip data for your selected state and compute the necessary statistics for each month. On each iteration of the loop you can assign the statistics to the corresponding row and column in your precip_state_monthly_summary
data frame.
Note: Make sure that you are computing statistics for your state of interest and NOT for all of the US data.
# Your code here
Take a look at your statistics table
Now that you’ve examined the seasonal (monthly) patterns let’s take a look at total annual precipitation over time for your state of interest. By examining the annual totals we’ll be able to see if there have been any trends in precipitation over time (potentially due to shifts in climate) as well as identify periods that are relatively wet or relatively dry (meteorological droughts).
We will need to create a data frame called state_annual_precip
were we will store the total precipitation for each year. This data frame will have two columns:
Create a new data frame with these two columns (similar to what you did in the monthly exercise above). You may find the unique()
function helpful when trying to create a vector of the years. You can get more info by typing ?unique()
into your CONSOLE.
# Your code here
Take a look at your new data frame and make sure it looks good before proceeding.
Once you’ve created your new state_annual_precip
data frame you will need to loop over your data frame that has all of the data for your state of interest and compute the total annual precipitation for each year. On each iteration of the loop you should assign the total annual value to the corresponding row in your state_annual_precip
data frame.
Note: Make sure that you are computing values for your state of interest and NOT for all of the US data.
# Your code here
Take a look at your state_annual_precip
data frame once you’ve populated it with your data and make sure it looks good.
Let’s see how precipitation has varied over time. To do this we can create a simple line plot (with data points shown) with precipitation on the y-axis and year on the x-axis.
You can use the code below as a template for making your figure. We’ll show both a line representing our data geom_line
as well as points geom_point
so that we can identify each individual measurement.
ggplot(data frame here, aes(x = ... , y = ... )) + geom_line() + geom_point()
Once you’ve made that figure let’s make another figure with the same data, but let’s add a fitted smooth line to the data to help identify if there are any trends (note, we’ll go into these plotting techniques in much greater detail in upcoming lectures). To do this take the exact code you used above and then add + geom_smooth()
to the end of the code
If there are any long-term trends in your data they may be more visible now with the geom_smooth()
added.
Let’s do one more quick thing to examine statistics on the annual precipitation data for your state. Use the summary()
function to get a bit more info on your state_annual_precip
data.
Ok, now we’ve got a table, figures, and summary statistics for the annual precipitation data. Let’s begin to think about this data.
If you are interested you could create new code blocks where you tweak your above code to compute seasonal totals (e.g. March through May) over time. This will allow you to see if a particular part of the year has become wetter or drier over time. This is optional but is an interesting “above and beyond” avenue to pursue.
You should conduct additional analysis with the dataset to investigate any questions that you may be interested in. The above analysis and the readings may be good places to start when thinking about additional avenues of investigation. You may also pursue the challenge below as part of your additonal analysis.
If you finish early then proceed to this challenge.
Create a new data frame all_states_summary
, where you compute the average annual precipitation for each US state. Your all_states_summary
data frame should have two columns:
Hint: You can compute the Avg_tot for a given state by summing up all of the data for that state and dividing the sum by the number of years contained in the data record.
FYI, you can get the length of a vector using the length()
function (or the dim()
function if for the dimensions of a data frame.
Hint: You can do the above challenge with a single for
loop, where you loop over each state. Recall the unique()
function we used earlier. It will be very helpful in creating a vector that has the two letter state codes.
Note: You’ll need to loop over the original data frame precip_data
, which has data for all of the states.
# Your code here
Take a look at your complete all_states_summary
data frame. Does everything look ok? Examine the data and think about what you are seeing:
Your lab is due prior to the start of next week’s lab. Once you are finished and satisfied with your work you should knit
it to an html
file and submit both your html
and Rmd
file to Nexus.
Please make sure that the html
file you submit is .html
and not nb.html
.
Remember that I’ve posted some guidance on producing lab reports to our class website