Fecal indicator bacterias (FIB) are studied in drinking and recreational waters worldwide as an indicator for fecal matter and sewage contamination (Gullatt 2015). While commonly associated with foodborne illnesses, the pathogens can be found in freshwater and brackish settings. In relation to non-foodborne illness, the sources of FIB are commonly described as coming from mammals and birds. It is similarly well understood that human fecal pollution is a critical source of contamination. Fecal coliform such as E. coli and enterococci are dangerous bacteria in which, when infected, can be correlated to severe fatal gastrointestinal illness (Korajkic et al., 2013). Because these organisms are signs of the presence of disease-causing bacteria and viruses, they pose a serious health risk to those fishing, swimming, or consuming contaminated water. The pathogens enter the system through a number of different improperly functioning systems such as wastewater treatment plants, leaking septic systems, storm water runoff, animal biproducts such as carcusses and manure (Gullatt 2015). The public health issue at hand has caused many levels of government to intervene and regulate the quality of public bodies of water.
I want to study how dramatic increases in precipitation (rainfall) influence the amount of entero bacteria at nine different locations within three cities (Utica, Schenectady, and Amsterdam) along the Mohawk River as an assessment for water quality. This is critical because entero bacteria can cause fatal gastrointestinal illness among populations if the microbial organisms come in direct internal contact with the human body.
I will investigate if the relationship between rainfall and entero bacteria levels in 3 different cities along the Mohawk River is an exponential relationship.
These packages are used to analyse and assess the data so that we can make conclusions about the relationship between fecal bacterial indicators and high precipitation levels in the Mohawk River.
library(tidyverse)
library(readr)
library(lubridate)
library(plotly)
Below the data is loaded into R Studio in a table format. This allows us to isolate columns, values, and categories for further analysis. If you view the table below, you will see a dataset with 10 columns and 186 rows.
Below is a summary of the data, identifying the class of each row (character, number, etc.) and the length of each column. For columns with numerical values, the minimum, first quartile, median, mean, third quartile and maximum are calculated. This might be helpful to know in further analyses.
## Month Day Year
## Length:186 Length:186 Length:186
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Waterway Relative_Direction City
## Length:186 Length:186 Length:186
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Access_Point Enterococcus_Count Quality Rain_Day_of
## Length:186 Min. : 1.0 Length:186 Min. :0.0000
## Class :character 1st Qu.: 14.0 Class :character 1st Qu.:0.0000
## Mode :character Median : 72.5 Mode :character Median :0.0000
## Mean : 455.0 Mean :0.1409
## 3rd Qu.: 517.0 3rd Qu.:0.1000
## Max. :4839.0 Max. :1.8000
## Rain_Day_prior Rain_Day_2p Rain_Day_3p Cum_Rain_4Days
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.3000
## Mean :0.1204 Mean :0.1516 Mean :0.08172 Mean :0.4946
## 3rd Qu.:0.1000 3rd Qu.:0.0000 3rd Qu.:0.10000 3rd Qu.:0.8000
## Max. :2.2000 Max. :2.2000 Max. :0.80000 Max. :3.2000
Below is a plot showing the enterococcus count with each sampling event, grouped by year. This is important because it can show a trend of overall increasing entercoccus count over time. This may relate to the continual breakdown of CSO infrastructure over time leading to elevated or increasing enterococcus count.
Below is a plot of all data (at all locations) of the enterococcus count over 4 days prior to sample collection. By adding a trendline to the graph, it allows us to see a general relationship between rainfall and enterococcus count within 4 days of sampling, however the relationship isn’t fully clear. Therefore, in order to better understand this relationship, I will later investigate the rainfall 10 days prior to sampling to determine if long term or mass rain events can impact the amount of enterococcus in the Mohawk.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Below is a plot of all data of the enterococcus count over 4 days prior to sample collection, grouped by city of sampling (Utica, Amsterdam, and Schenectady). By adding trendlines to the graph, it allows us to see a general relationship between rainfall and enterococcus count within 4 days of sampling in each city to determine how the CSO infrastructures are working in each area. However, the relationship isn’t fully clear. While enterococcus count increases with 4 total days of cumulative rainfall in general, there is an ambiguous region of the data (around 0.5 to 1 inch of rainfall total), where the enterococcus count doesn’t match the overall trend. Therefore, in order to better understand this relationship, I will later investigate the rainfall 10 days prior to sampling to determine if long term or mass rain events can impact the amount of enterococcus in the Mohawk.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Below the data is loaded into R Studio in a table format. This allows us to isolate columns, values, and categories for further analysis. If you view the table below, you will see a dataset with 6 columns and 3159 rows. The data set is made up of rainfall (inches) and rain rate over the 10 days prior to sampling by Riverkeeper. This data will be crucial when looking at rainfall events to determine if they are extreme events or gradual rainfall events. By definition in this report, an extreme event can be categorized by a large signal of rainfall over a short amount of time, whereas a gradual rainfall event is considered an event over a period of time where small amounts of rain fall overtime and the rain rate is low. To analyze these events, we are able to create plots below that show rainfall signals over time.
At this point, the data needed to be categorized by time period in which sampling occured each month. Therefore, for example, during the June 2017 sampling event, we were able to extract rainfall data between the dates 6/07/2017 and 6/17/2017 and called that int_6. We were then able to use these intervals to create a case-when statement. The case when named each event, where we subsequently used that column to plot and create figure 1 below.
Schenectady_Rain <- Schenectady_Rain %>%
mutate(Date = mdy_hm(Date_Time))
Schenectady_Rain
#This step created a new column that identified the data in the original Schenectady_Rain dataset into date format, allowing us to use a different package, lubridate, to manage and organize the data.
int_5 <- interval(ymd("2017-05-12"), ymd("2017-05-23"))
int_6 <- interval(ymd("2017-06-07"), ymd("2017-06-18"))
int_7 <- interval(ymd("2017-07-07"), ymd("2017-07-18"))
int_8 <- interval(ymd("2017-08-11"), ymd("2017-08-22"))
int_9 <- interval(ymd("2017-09-07"), ymd("2017-09-18"))
int_10 <- interval(ymd("2017-10-13"), ymd("2017-10-24"))
#This step created intervals that identified the data in the original Schenectady_Rain dataset (date format) into intervals based on each event. These are easily categorized by month for ease of understanding further in the project.
Schenectady_Rain <- Schenectady_Rain %>% mutate(
Event_Int = case_when(
Date %within% int_5 ~ "May_Event",
Date %within% int_6 ~ "June_Event",
Date %within% int_7 ~ "July_Event",
Date %within% int_8 ~ "Aug_Event",
Date %within% int_9 ~ "Sept_Event",
Date %within% int_10 ~ "Oct_Event")
)
#Since the previous step created intervals that identified the data in the original Schenectady_Rain dataset (date format) into intervals based on each event, this creates a case-when function which creates a new column identifying all the data by date into the 6 monthly events before Riverkeeper collection. Below is a table printed of the data we were able to finally create.
Schenectady_Rain
Schenectady_Rain <- Schenectady_Rain %>% group_by(Event_Int) %>% mutate(Event_Number = row_number())
#Here we were able to, by event, create a column that gave each half hour of rainfall data a corresponding number. It was crucial to do this so that our graphs could be plotted across 10 day intervals instead of real time/date intervals. This means that the x-axis created was a function of days prior to riverkeeper sampling, instead of May 10 2017 at 11:30pm to May 20 2017 at 11:30pm.
Figure 1 is created below. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. We predicted that the more “spikey” and intense the rainfall events, the more likely the enterococcus count is to spike. Further investigations after the creation of figure 1 will determine if our hypothesis was true.
Figure 2 is created below. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs were facet’ed by each event so the rates of rainfall are more easily understood during each event.
Figures 3 through 8 are created below. These graphs show that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs are not facet’ed by each event.
Below the data is filtered by Schenectady data. This graph shows that over a 10 day interval, rainfall rate fluctuates and therefore can change the nature of how rainfall is absorbed into the ground or through the CSO systems. The graphs were facet’ed by each event so the rates of rainfall are more easily understood during each event.
Schenectady_Table <- Riverkeeper_Data %>% filter(Year == 2017, City == "Schenectady") %>% select(Month, Day, Year, Relative_Direction, City, Access_Point, Enterococcus_Count, Quality)
Schenectady_Table
The graph below shows enterococcus count throughout 2017 with each sample. As you can see, the beach advisory quality of water is more common earlier in the year, around May and June. This might be due to the high rates of rainfall and groundmelt after the winter season adding water to the river system. This graph allows us to visualize a trend of high enterococcus count with seasonality, and potentially with more years of weather and Riverkeeper data we could come up with a concrete model.
In a comparison study of E. coli, enterococci and fecal coliform as an indicator of brackish water quality assessment in Lake Pontchartrain, Louisiana in 2004, Jin et al. describes the microbial organisms as suspended and lake-bottom particles to determine reduction rate constants. The study discusses the attachment of these bacteria to suspended matter and subsequently sedimented matter as a mechanism for development. Because the study finds that the bottom sediment acts as a reservoir for these pathogens and added the concern of recontamination of overlying waters, Jin et al. interpreted the results such that enterococci is a more stable indicator than E. coli and other strains of fecal coliform in brackish waters. In contrast to Lake Pontchartrain, Louisiana, the site of this study is the Mohawk River, a northern freshwater tributary to the Hudson River located in upstate New York.
Collecting water samples and investigating pathogen levels is the key to ensuring that the quality of water is up to health regulation defined on a federal, state and local level. By understanding standards for water quality assessment on a federal level, we can then identify how New York State is managing inadequate water testing and unsafe levels of bacteria, more specifically in the Mohawk River. However, the two entities are a crossroads in policy, in attempts to determine how to most effectively and efficiently study the quality of water in New York State. Furthermore, this project outlined Riverkeeper’s data collection, a private non-profit environmental organization, and weather data from Union College’s weather station to analyze how enterococcus fluctuates with rainfall over time.
As annotated above, the plots show general trends between rainfall, seasonality and enterococcus count. With this information, scientists might be able to make a push for additional sampling and a more in depth study with the fluxes of fecal bacterial indicators and rainfall rate. The analyses and study will allow for more constrained and strict reform on urban CSO’s and allow for the improvement of sewage infrastructure in cities across upstate New York, such as Utica, Schenectady and Amsterdam.
Gullatt, Kristin. E. Coli and Enterococcus. United States Environmental Protection Agency:, 2015. Print.
Jin G, Englande A, Bradford H, H-w J (2004) Comparison of E. coli, enterococci, and fecal coliform as indicators for brackish water quality assessment. Water Environ Res 76:245–255
Korajkic, Asja, et al. “Differential Decay of Enterococci and Escherichia Coli Originating from Two Fecal Pollution Sources.” Applied and Environmental Microbiology 79.7 (2013): 2488. Web.