Introduction
- Motivation/Purposes
- Scientific Question
Loading in the Packages and Data Sets
Plots showing relationships between Enterococcus count and rainfall within four days of sampling
Plots showing relationships between rainfall over time within ten days of sampling in Schenectady
Schenectady Case Study
Conclusions
References

Introduction

Fecal indicator bacterias (FIB) are studied in drinking and recreational waters worldwide as an indicator for fecal matter and sewage contamination (Gullatt 2015). While commonly associated with foodborne illnesses, the pathogens can be found in freshwater and brackish settings. In relation to non-foodborne illness, the sources of FIB are commonly described as coming from mammals and birds. It is similarly well understood that human fecal pollution is a critical source of contamination. Fecal coliform such as E. coli and enterococci are dangerous bacteria in which, when infected, can be correlated to severe fatal gastrointestinal illness (Korajkic et al., 2013). Because these organisms are signs of the presence of disease-causing bacteria and viruses, they pose a serious health risk to those fishing, swimming, or consuming contaminated water. The pathogens enter the system through a number of different improperly functioning systems such as wastewater treatment plants, leaking septic systems, storm water runoff, animal biproducts such as carcusses and manure (Gullatt 2015). The public health issue at hand has caused many levels of government to intervene and regulate the quality of public bodies of water.

Motivation/Purposes

I want to study how dramatic increases in precipitation (rainfall) influence the amount of entero bacteria at nine different locations within three cities (Utica, Schenectady, and Amsterdam) along the Mohawk River as an assessment for water quality. This is critical because entero bacteria can cause fatal gastrointestinal illness among populations if the microbial organisms come in direct internal contact with the human body.

Scientific Question

I will investigate if the relationship between rainfall and entero bacteria levels in 3 different cities along the Mohawk River is an exponential relationship.

Loading in the Packages and Data Sets

These packages are used to analyse and assess the data so that we can make conclusions about the relationship between fecal bacterial indicators and high precipitation levels in the Mohawk River.

library(tidyverse)
library(readr)
library(lubridate)
library(plotly)

Below is a summary of the data, identifying the class of each row (character, number, etc.) and the length of each column. For columns with numerical values, the minimum, first quartile, median, mean, third quartile and maximum are calculated. This might be helpful to know in further analyses.

##     Month               Day                Year          
##  Length:186         Length:186         Length:186        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    Waterway         Relative_Direction     City          
##  Length:186         Length:186         Length:186        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  Access_Point       Enterococcus_Count   Quality           Rain_Day_of    
##  Length:186         Min.   :   1.0     Length:186         Min.   :0.0000  
##  Class :character   1st Qu.:  14.0     Class :character   1st Qu.:0.0000  
##  Mode  :character   Median :  72.5     Mode  :character   Median :0.0000  
##                     Mean   : 455.0                        Mean   :0.1409  
##                     3rd Qu.: 517.0                        3rd Qu.:0.1000  
##                     Max.   :4839.0                        Max.   :1.8000  
##  Rain_Day_prior    Rain_Day_2p      Rain_Day_3p      Cum_Rain_4Days  
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.3000  
##  Mean   :0.1204   Mean   :0.1516   Mean   :0.08172   Mean   :0.4946  
##  3rd Qu.:0.1000   3rd Qu.:0.0000   3rd Qu.:0.10000   3rd Qu.:0.8000  
##  Max.   :2.2000   Max.   :2.2000   Max.   :0.80000   Max.   :3.2000

Plots showing relationships between Enterococcus count and rainfall within four days of sampling

Below is a plot showing the enterococcus count with each sampling event, grouped by year. This is important because it can show a trend of overall increasing entercoccus count over time. This may relate to the continual breakdown of CSO infrastructure over time leading to elevated or increasing enterococcus count.

Below is a plot of all data (at all locations) of the enterococcus count over 4 days prior to sample collection. By adding a trendline to the graph, it allows us to see a general relationship between rainfall and enterococcus count within 4 days of sampling, however the relationship isn’t fully clear. Therefore, in order to better understand this relationship, I will later investigate the rainfall 10 days prior to sampling to determine if long term or mass rain events can impact the amount of enterococcus in the Mohawk.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Below is a plot of all data of the enterococcus count over 4 days prior to sample collection, grouped by city of sampling (Utica, Amsterdam, and Schenectady). By adding trendlines to the graph, it allows us to see a general relationship between rainfall and enterococcus count within 4 days of sampling in each city to determine how the CSO infrastructures are working in each area. However, the relationship isn’t fully clear. While enterococcus count increases with 4 total days of cumulative rainfall in general, there is an ambiguous region of the data (around 0.5 to 1 inch of rainfall total), where the enterococcus count doesn’t match the overall trend. Therefore, in order to better understand this relationship, I will later investigate the rainfall 10 days prior to sampling to determine if long term or mass rain events can impact the amount of enterococcus in the Mohawk.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Plots showing relationships between rainfall over time within ten days of sampling in Schenectady

Below the data is loaded into R Studio in a table format. This allows us to isolate columns, values, and categories for further analysis. If you view the table below, you will see a dataset with 6 columns and 3159 rows. The data set is made up of rainfall (inches) and rain rate over the 10 days prior to sampling by Riverkeeper. This data will be crucial when looking at rainfall events to determine if they are extreme events or gradual rainfall events. By definition in this report, an extreme event can be categorized by a large signal of rainfall over a short amount of time, whereas a gradual rainfall event is considered an event over a period of time where small amounts of rain fall overtime and the rain rate is low. To analyze these events, we are able to create plots below that show rainfall signals over time.

At this point, the data needed to be categorized by time period in which sampling occured each month. Therefore, for example, during the June 2017 sampling event, we were able to extract rainfall data between the dates 6/07/2017 and 6/17/2017 and called that int_6. We were then able to use these intervals to create a case-when statement. The case when named each event, where we subsequently used that column to plot and create figure 1 below.

Schenectady_Rain <- Schenectady_Rain %>% 
  mutate(Date = mdy_hm(Date_Time))
Schenectady_Rain

#This step created a new column that identified the data in the original Schenectady_Rain dataset into date format, allowing us to use a different package, lubridate, to manage and organize the data. 

int_5 <- interval(ymd("2017-05-12"), ymd("2017-05-23"))

int_6 <- interval(ymd("2017-06-07"), ymd("2017-06-18"))

int_7 <- interval(ymd("2017-07-07"), ymd("2017-07-18"))

int_8 <- interval(ymd("2017-08-11"), ymd("2017-08-22"))

int_9 <- interval(ymd("2017-09-07"), ymd("2017-09-18"))

int_10 <- interval(ymd("2017-10-13"), ymd("2017-10-24"))

#This step created intervals that identified the data in the original Schenectady_Rain dataset (date format) into intervals based on each event. These are easily categorized by month for ease of understanding further in the project.

Schenectady_Rain <- Schenectady_Rain %>% mutate(
  Event_Int = case_when(
    Date %within% int_5 ~ "May_Event", 
    Date %within% int_6 ~ "June_Event",
    Date %within% int_7 ~ "July_Event",
    Date %within% int_8 ~ "Aug_Event",
    Date %within% int_9 ~ "Sept_Event", 
    Date %within% int_10 ~ "Oct_Event")
)

#Since the previous step created intervals that identified the data in the original Schenectady_Rain dataset (date format) into intervals based on each event, this creates a case-when function which creates a new column identifying all the data by date into the 6 monthly events before Riverkeeper collection.  Below is a table printed of the data we were able to finally create.

Schenectady_Rain

Schenectady_Rain <- Schenectady_Rain %>% group_by(Event_Int) %>% mutate(Event_Number = row_number())

#Here we were able to, by event, create a column that gave each half hour of rainfall data a corresponding number. It was crucial to do this so that our graphs could be plotted across 10 day intervals instead of real time/date intervals. This means that the x-axis created was a function of days prior to riverkeeper sampling, instead of May 10 2017 at 11:30pm to May 20 2017 at 11:30pm.

Figure 1 is created below. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. We predicted that the more “spikey” and intense the rainfall events, the more likely the enterococcus count is to spike. Further investigations after the creation of figure 1 will determine if our hypothesis was true.

Figure 2 is created below. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs were facet’ed by each event so the rates of rainfall are more easily understood during each event.

Figures 3 through 8 are created below. These graphs show that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs are not facet’ed by each event.

Schenectady Case Study

Below the data is filtered by Schenectady data. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs were facet’ed by each event so the rates of rainfall are more easily understood during each event.

Schenectady_Table <- Riverkeeper_Data %>% filter(Year == 2017, City == "Schenectady") %>% select(Month, Day, Year, Relative_Direction, City, Access_Point, Enterococcus_Count, Quality)

Schenectady_Table

The graph below shows enterococcus count throughout 2017 with each sample. As you can see, the beach advisory quality of water is more common earlier in the year, around May and June. This might be due to the high rates of rainfall and groundmelt after the winter season adding water to the river system. This graph allows us to visualize a trend of high enterococcus count with seasonality, and potentially with more years of weather and Riverkeeper data we could come up with a concrete model.

Conclusions

In a comparison study of E. coli, enterococci and fecal coliform as an indicator of brackish water quality assessment in Lake Pontchartrain, Louisiana in 2004, Jin et al. describes the microbial organisms as suspended and lake-bottom particles to determine reduction rate constants. The study discusses the attachment of these bacteria to suspended matter and subsequently sedimented matter as a mechanism for development. Because the study finds that the bottom sediment acts as a reservoir for these pathogens and added the concern of recontamination of overlying waters, Jin et al. interpreted the results such that enterococci is a more stable indicator than E. coli and other strains of fecal coliform in brackish waters. In contrast to Lake Pontchartrain, Louisiana, the site of this study is the Mohawk River, a northern freshwater tributary to the Hudson River located in upstate New York.

Collecting water samples and investigating pathogen levels is the key to ensuring that the quality of water is up to health regulation defined on a federal, state and local level. By understanding standards for water quality assessment on a federal level, we can then identify how New York State is managing inadequate water testing and unsafe levels of bacteria, more specifically in the Mohawk River. However, the two entities are a crossroads in policy, in attempts to determine how to most effectively and efficiently study the quality of water in New York State. Furthermore, this project outlined Riverkeeper’s data collection, a private non-profit environmental organization, and weather data from Union College’s weather station to analyze how enterococcus fluctuates with rainfall over time.

As annotated above, the plots show general trends between rainfall, seasonality and enterococcus count. With this information, scientists might be able to make a push for additional sampling and a more in depth study with the fluxes of fecal bacterial indicators and rainfall rate. The analyses and study will allow for more constrained and strict reform on urban CSO’s and allow for the improvement of sewage infrastructure in cities across upstate New York, such as Utica, Schenectady and Amsterdam.

References

Gullatt, Kristin. E. Coli and Enterococcus. United States Environmental Protection Agency:, 2015. Print.

Jin G, Englande A, Bradford H, H-w J (2004) Comparison of E. coli, enterococci, and fecal coliform as indicators for brackish water quality assessment. Water Environ Res 76:245–255

Korajkic, Asja, et al. “Differential Decay of Enterococci and Escherichia Coli Originating from Two Fecal Pollution Sources.” Applied and Environmental Microbiology 79.7 (2013): 2488. Web.

Increases in Fecal Bacterial Indicators with High Precipitation Levels in the Mohawk River

Final Project

Madalyn Borek

14 Mar 2019