The vast majority (>96%) of available freshwater on Earth exists as groundwater and as such it plays an important role in both environmental/ecological processes as well an important societal role1. Groundwater serves as a critical water source for much of humanity – with more than 50% of the global population relying on it for their drinking water. Furthermore, groundwater accounts for in excess of 40% of the water used for irrigation worldwide 2. While groundwater can have many advantages over surface water (e.g. less likely to be affected by microbial contamination or human/industrial pollution) there are nonetheless water quality issues that can arise and threaten the health of those who consume it.
In South and Southeast Asia, exposure to naturally occurring arsenic (As) in the groundwater has led to what has been called “the largest poisoning of a population in history”, with over 100 million people exposed to groundwater with unsafe levels of arsenic3,4. Throughout S-SE Asia, wells often tap groundwater this is significantly in excess of the 10 \(ug/L\) level considered acceptable by the World Health Organization (WHO). One particular challenge faced throughout the region is that, while arsenic concentrations are frequently high, they nonetheless exhibit significant variability from well to well and between depths. Thus, one family may have a well that exceeds the WHO guidelines for arsenic, while their neighbor only 200 meters away has a well that is low in arsenic and safe to drink. This variability in space and depth into an aquifer creates challenges when trying to predict where low arsenic groundwater will occur.
While the ability to model/predict exactly where high vs. low arsenic will occur is still somewhat limited, the processes responsible for the release of arsenic into the groundwater are broadly understood. Arsenic is naturally find in the sediments of S-SE Asia (and generally on sediments throughout the world) and can be released into groundwater under the right set of geochemical conditions. Arsenic is generally associated with iron-oxide minerals and when microbes within an aquifer consume organic matter (i.e. their food) they can deplete oxygen. Once oxygen and other oxidizing agents (e.g. nitrate, sulfate, …) are depleted they can switch to “breathing” iron oxides. This dissolves the solid iron minerals and releases any associated arsenic into the groundwater.
In today’s lab we will examine a dataset from the British Geological Survey (BGS) that was collected in the early 2000’s, soon after the issue of widespread arsenic contamination came to light5. The BGS along with the government of Bangladesh, which is the most heavily affected country, collected and analyzed the chemistry of groundwater samples from several thousand wells throughout Bangladesh. This dataset was the first large dataset available to begin developing an understanding of the scope and scientific issues associated with the problem.
You will conduct some exploratory data analysis using the BGS data to help develop a better understanding of the variability in arsenic concentration and the factors associated with high levels of groundwater arsenic.
Before writing up your report, you should read the following sources. These readings will give you helpful background info and context and will help guide your analysis. In addition to these sources, you may find it helpful to search out other sources to help you make sense of your analysis.
In addition to the article above you will also find the BGS report on arsenic in Bangladesh helpful. However, this report is hundreds of pages and while you may be interested in flipping through it, the following summary sections are only 1-2 pages each and will give you a nice overview of their report.
Below is a map of Bangladesh showing the regions (Divisions). This map may be helpful when interpreting your data.
Map of Bangladesh (source: Wikipedia)
For today’s lab, imagine that you are part of the initial reconnaissance team from the British Geological Survey and you’ve just finished the field and lab work and are now sitting down to examine the data. Your primary goals include:
First you will need to download the data. The data can be downloaded from our class site using the following code.
A description of the variables in the dataset can be found in the readme file
library(tidyverse)
bangladesh_gw <- read_csv("https://stahlm.github.io/ENS_215/Data/NationalSurveyData_DPHE_BGS_LabData.csv")
Take a look at the data frame and make sure everything looks good and that you understand the variables contained in the dataset.
In this lab are going to examine how arsenic concentrations (as well as other geochemical parameters) vary with depth into the aquifer. Let’s create a categorical variable that classifies each observation based on the depth fo the well the sample was collected from. This categorial variable will allow use to easily group our data into depth classes.
Add a new variable depth_cat
to your data frame. This variable will take the values shallow, intermed, or deep based on the sampled well’s depth:
Once you’ve created depth_cat
convert it to a factor. Make sure the factor is ordered with the following order:
Also add a categorical variable As_cat
. This variable will take the values low As, med As, or high As based on the sampled well’s depth:
Once you’ve created As_cat
convert it to a factor. Make sure the factor is ordered with the following order:
Now let’s create summary tables that distill the information in the large dataset down to more easily digestable tables. These types of summary tables are great to make when generating papers/reports for others, where they won’t necessarily have the time or expertise to go through all of the data. These tables are also a helpful tool for the person investigating the data (i.e. you) when trying to get a handle on a large dataset.
You plan on meeting with some fellow environmental scientists to discuss the scientific questions around arsenic contamination of groundwater in Bangladesh. Before you meet you want to have a few tables that provide a basic overview of the groundwater geochemistry.
Note: Be smart about this and look at your dplyr
cheat sheet. There is a very efficient way to do this.
The summary geochemical summary tables above are great for discussions among fellow scientists however, we it is particularly important to convey information about arsenic risk exposure to government official and policy makers who may not have an understanding of groundwater chemistry. Let’s create some tables that help to assess the exposure risk in Bangladesh.
Division | number of wells | % low As | % medium As | % high As |
---|---|---|---|---|
data here | data here | data here | data here | data here |
Depth category | number of wells | % low As | % medium As | % high As |
---|---|---|---|---|
data here | data here | data here | data here | data here |
For your additional analysis you should continue exploring the dataset. You may decide to expand upon the work above and/or develop completely new lines of inquiry. When conducting your additional analysis, think back to your overall goals that were outlined in the Objectives section. Some ideas for additonal analysis include, but are not limited to:
You are free to pursue these and any other ideas that you may think of.
Remember that your lab should be a nicely formatted and organized report that includes your code, output, and discussion. You should have an introduction and conclusion section and you should clearly delineate different sections with headers. Also remember to cite any sources that your rely/reference.
At this point in the term you should start to think about the organizational logic of your final lab report (i.e. order your analysis sections and discussion in a sensible manner). You don’t need to get bogged down on this point when working in lab, but it is good to consider when you are putting together your final report for submission.
The “additional analysis” section of your lab is a critical component and should be given careful thought and effort.
It is important to realize that my expectations for each of the graded criteria increase as the term progresses and you gain more skills and are better able to develop your own analyses.
Make sure you are familiar with the expectations and evaluation criteria presented in the links below:
Your lab is due prior to the start of next week’s lab. Once you are finished and satisfied with your work you should Knit it to an .html
file and submit both your html
and Rmd
file to Nexus.
Please make sure that the .html
file you submit is .html
and not nb.html
.