Lab 2: US Precipitation Data

Overview

Precipitation is the source of renewable water resources to Earth’s environments. Precipitation ultimately sustains groundwater and surface water supplies and is central to the economic and environmental well-being of humanity. Throughout history, humankind has had a deep interest and need to understand and monitor precipitation and in the modern era precipation records have been collected across the globe. The distribution of precipitation both in space (i.e. between locations) and in time (both seasonal and inter-annual) has shaped the Earth’s surface and influences many aspects of our day to day existence. For instance the available water resources in a given region are in large part controlled by the precipitation that falls in that area, thus where humans have decided to settle and establish agricultural and population centers is influenced by precipitation patterns. Understanding spatial and temporal patterns in precipitation is thus, of critical societal importance and as such governmental and non-governmental agencies around world continually collect and analyze this data.

The National Oceanic and Atmospheric Agency (NOAA) is one of the key agencies responsible for the collection and analysis of precipitation data (and other climatic data) in the US. In today’s lab you will perform some exploratory data analysis with NOAA time-series precipitation data for US states.

Background reading and resources

Before starting the analysis, you need to read the following sources. These readings will give you helpful background info and context and will help guide your analysis. In addition to these sources, you may find it helpful to search out other sources to help you make sense of your analysis.

The National Climate Assessment

Other helpful resources

Precipitation (Wikipedia) Introduction section
Your cheatsheets and textbook

Analyze precipitation time-series data

Today we are going to apply the fundamental R programming skills that we have learned thus far to analyze precipitation data from NOAA. The data we will use in today’s lab is described below:

Data Source: NOAA Climate at a Glance: Precipitation Time Series
Dataset has total precipitation, reported monthly from 1895-2017, on a state-by-state basis for the US. The dataset has four columns

Year: Year in numeric format
Month: Month in numeric format
Precip_inches: Total monthly precipitaton in inches
state_cd: State abbreviation (e.g. NY for New York, MA for Massachusetts,…)

Interesting note: Due to the government shutdown the NOAA website that hosts the data is offline. Luckily I had downloaded it before the shutdown, otherwise I would have had to cancel lab and give everyone a 100%.

Load in the data

library(readr)
library(tidyverse)

precip_data <- read_csv("https://stahlm.github.io/ENS_215/Data/NOAA_State_Precip_LabData.csv")

Examine the data

Take a look at the data and make sure you understand the dataset. You may want to run the summary() function to see if the data looks reasonable (e.g. no unreasonable values such as negative numbers for precipitation).

Select a state of interest

Choose a state you are interested in studying. Select the data for just that state and save it to a new data frame. You can use the logical operators we learned this week to select only the rows that have the state of interest.

# Your code here

Examine seasonal patterns in precipitation: Part 1

Take a look and see how precipitation varies between the months. Is there a clear seasonal signal (e.g. is one part of the year relatively wet)?

To get started you can make a few figures. Try a simple scatter plot of precipitation vs month. The basic code structure is below

ggplot(your data frame here) + geom_jitter(aes(x = factor(...) , y = ... ))

Note that in the code above we put our x data inside the factor() function. This will make the data plot nicer, since it will treat each month as a discrete entity/group. The geom_jitter function is generates a scatter plot, but adds a little “noise” to the data so that we can see points that would otherwise overlap. If we didn’t jitter the points then they would pile on top of on another along the x-axis (e.g. the January points would all have an x-value = 1, …)

A box and whisker plot is also helpful and will help to more cleanly summarize the data in each month. We’ll learn more about these types of figures in the upcoming weeks, so if you are not very familiar with them don’t worry (if you have never seen one before ask a classmate or me to explain them). The basic code structure is below.

ggplot(your data frame here) + geom_boxplot(aes(x = factor(...) , y = ... ))

Do you observe any seasonality in your data? Think about the variability or lackthereof and what that might imply from an environmental and societal perspective (e.g. flooding, suitability for agriculture, water availability).
Do any months look like they have notably extreme precipitation values?
Do you observe anything else interesting or notable?

Examine seasonal patterns in precipitation: Part 2

Now let’s create a table that summarizes some monthly precipitation statistics. Our table will have twelve rows (one for each month) and four columns. The columns we want to create are described below:

Month: Numeric value for each month (e.g. 1, 2, 3,…, 12)
Precip_avg: The monthly average precipitation (i.e. for January you would take the average of all the January data, for Feb…)
Precip_min: Minimum observed precipitation for a given month
Precip_max: Maximum observed precipitation for a given month

You can use your looping skills to loop over the dataset and compute the statistics for each month. First let’s create a data frame that we’ll use to store the statistics we compute. Below is the code to initialize your data frame. We’ll fill the Month column and we’ll initialize all other columns to -9999 which will act as a placeholder until we’ve calculated the relevant statistics.

precip_state_monthly_summary <- data.frame(Month = 1:12, Precip_avg = -9999, Precip_min = -9999, Precip_max = -9999)

Take a look at your new data frame and make sure it looks ok.

Now you should create your for loop to loop through your data frame that has all of the precip data for your selected state and compute the necessary statistics for each month. On each iteration of the loop you can assign the statistics to the corresponding row and column in your precip_state_monthly_summary data frame.

Note: Make sure that you are computing statistics for your state of interest and NOT for all of the US data.

# Your code here

Take a look at your statistics table

How do the monthly means (averages) compare to the mins and maxes?
How wet was the wettest month on record? How about the driest month?
Do you observe any other interesting or notable features in the data?
Create a plot to help you visualize and understand this data (a basic plot is fine at this point. We will focus in-detail on plotting in upcoming lectures).

Examine total annual precipitation over time

Now that you’ve examined the seasonal (monthly) patterns let’s take a look at total annual precipitation over time for your state of interest. By examining the annual totals we’ll be able to see if there have been any trends in precipitation over time (potentially due to shifts in climate) as well as identify periods that are relatively wet or relatively dry (meteorological droughts).

We will need to create a data frame called state_annual_precip were we will store the total precipitation for each year. This data frame will have two columns:

Year: The year
Tot_precip: The total precipitation for that year (i.e. the sum of the monthly precipitation values in that year)

Create a new data frame with these two columns (similar to what you did in the monthly exercise above). You may find the unique() function helpful when trying to create a vector of the years. You can get more info by typing ?unique() into your CONSOLE.

# Your code here

Take a look at your new data frame and make sure it looks good before proceeding.

Once you’ve created your new state_annual_precip data frame you will need to loop over your data frame that has all of the data for your state of interest and compute the total annual precipitation for each year. On each iteration of the loop you should assign the total annual value to the corresponding row in your state_annual_precip data frame.

Note: Make sure that you are computing values for your state of interest and NOT for all of the US data.

# Your code here

Take a look at your state_annual_precip data frame once you’ve populated it with your data and make sure it looks good.

Create a time-series plot of annual precipitation

Let’s see how precipitation has varied over time. To do this we can create a simple line plot (with data points shown) with precipitation on the y-axis and year on the x-axis.

You can use the code below as a template for making your figure. We’ll show both a line representing our data geom_line as well as points geom_point so that we can identify each individual measurement.

ggplot(data frame here, aes(x = ... , y = ... )) + geom_line() + geom_point()

Once you’ve made that figure let’s make another figure with the same data, but let’s add a fitted smooth line to the data to help identify if there are any trends (note, we’ll go into these plotting techniques in much greater detail in upcoming lectures). To do this take the exact code you used above and then add + geom_smooth() to the end of the code

If there are any long-term trends in your data they may be more visible now with the geom_smooth() added.

Let’s do one more quick thing to examine statistics on the annual precipitation data for your state. Use the summary() function to get a bit more info on your state_annual_precip data.

Ok, now we’ve got a table, figures, and summary statistics for the annual precipitation data. Let’s begin to think about this data.

Do you observe any long-term trends? If so, are they consistent with the findings of the National Climate Assessment for your state (or the region in which your state is located)?
Do you observe any years (or periods) of extreme drought? If so, see if you can find any reporting on this drought (a quick Google search will probably point you towards some useful info).
Do you observe anything else notable in the data (e.g. extremely wet years, cyclical patterns,…)

If you are interested you could create new code blocks where you tweak your above code to compute seasonal totals (e.g. March through May) over time. This will allow you to see if a particular part of the year has become wetter or drier over time. This is optional but is an interesting “above and beyond” avenue to pursue.

Additional analysis

You should conduct additional analysis with the dataset to investigate any questions that you may be interested in. The above analysis and the readings may be good places to start when thinking about additional avenues of investigation. You may also pursue the challenge below as part of your additonal analysis.

Challenge

If you finish early then proceed to this challenge.

Create a new data frame all_states_summary, where you compute the average annual precipitation for each US state. Your all_states_summary data frame should have two columns:

State: Column with the two-letter state codes
Avg_tot: Column with the average annual precipitation for each state

Hint: You can compute the Avg_tot for a given state by summing up all of the data for that state and dividing the sum by the number of years contained in the data record.

FYI, you can get the length of a vector using the length() function (or the dim() function if for the dimensions of a data frame.

Hint: You can do the above challenge with a single for loop, where you loop over each state. Recall the unique() function we used earlier. It will be very helpful in creating a vector that has the two letter state codes.

Note: You’ll need to loop over the original data frame precip_data, which has data for all of the states.

# Your code here

Take a look at your complete all_states_summary data frame. Does everything look ok? Examine the data and think about what you are seeing:

What is the wettest state? The driest?
How wet is New York relative to most states?
Do the average annual precipitation data help to explain features that you may have observed regarding certain states (e.g. frequent flooding, frequent drought, minimal population, patterns in industry/agriculture,…)

Submit the lab

Your lab is due prior to the start of next week’s lab. Once you are finished and satisfied with your work you should knit it to an html file and submit both your html and Rmd file to Nexus.

Please make sure that the html file you submit is .html and not nb.html.

Remember that I’ve posted some guidance on producing lab reports to our class website