At the beginning of the week we worked on some basics of time-series analysis. Before we move on to interactive graphics I would like to first wrap-up our introduction to time-series data. In particular, I would like to introduce you all to cycle plots.
Another really useful way to examine data that may have both a seasonal and long-term trend is through the use of cycle plots. The best way to understand cycle plots is through an example. Let’s consider precipitation in California. There might be a seasonal pattern (there is as you will see shortly) and there could also be long-term trends in the data. For instance it might be the case that some months have been getting wetter (or drier) over time. Seeing both the seasonal pattern and the trends (in particular trends that apply only to some parts of the year) would be very difficult to see in a traditional time-series plot.
Let’s look at a standard time-series plot to highlight the point.
First load in the NOAA precipitation data that we’ve used in the past.
library(tidyverse)
library(lubridate)
precip_data <- read_csv("https://stahlm.github.io/ENS_215/Data/noaa_cag_state_precipitation.csv")
precip_data <- precip_data %>%
rename(Precip_inches = Value)
precip_data %>%
filter(YEAR >= 1980,
STATE == "California") %>%
mutate(date = ymd(paste(YEAR, MONTH, 15)) ) %>%
ggplot(aes(x = date, y = Precip_inches)) +
geom_line() +
geom_point() +
theme_classic() +
labs(x = "Year",
y = "Precip (inches)")
We can see the monthly precipitation for California from 1980-2024, however it is very difficult to see what the seasonal patterns are (e.g. wet vs. dry seasons) and also if there have been long-term trends in a given month (e.g. is February getting wetter?). A cycle plot allows us to answer these types of questions.
In a cycle plot we will plot a time-series of each months data (e.g. All of the January data ordered from earliest to most recent year, all of the Feb data ordered from earliest to most recent year,… ).
Below is a cycle plot of the California monthly precipitation data from 1980 onward. Thus the first panel (labeled “1”) has the monthly precipitation for all of the Januaries on record, with the left-most point being Jan 1980, the next point Jan 1981,…, the last point Jan 2024.
precip_data %>%
filter(YEAR >= 1980, YEAR < 2025,
STATE == "California") %>%
ggplot(aes(x = YEAR, y = Precip_inches)) +
geom_point() +
geom_smooth(se = F, method = "lm") +
facet_wrap(~ MONTH, ncol = 12) +
theme_classic() +
theme(axis.text.x = element_blank()) +
labs(x = "",
y = "Precip (inches)",
title = "California Monthly Precipitation",
subtitle = "1980-2024",
caption = "Data source: NOAA")
Look at the above figure.
The Schoharie Creek streamflow data is here
flow <- read_csv("https://stahlm.github.io/ENS_215/Data/USGS_streamflow_01351500.csv") %>%
drop_na() %>%
filter(Year >= 1940 & Year < 2025) %>% # select years 1940 through 2024
mutate(Date = make_date(Year, Month, Day)) # create a Date column that has the dates as an R date object
Throughout the term we have been creating visualizations of our data to understand, explain, and communicate our data and findings. However, all of the graphics have been static, thus presenting a fixed graphical representation. While static graphics can serve our purposes much of the time, there are nonetheless many scenarios where you would like to be able to dynamically interact with your graphic. In particular during the exploratory phases of data analysis you often want to do things such as zoom, pan, select subsets of data, and turn on and off layers – these types of interactivity can often speed up the process of exploratory data analysis. Interactive graphics also allow you to share a graphic with colleagues and allow them to explore the data without the need for the person creating the graphic (you) to create a new version everytime they would like to focus in on a particular section of the graphic or subset of data. Interactive graphics are also an excellent tool for communication and education purposes as they allow the user of the graphic to explore the data and extract insight in ways that are not possible with static graphics.
As a reminder, we have previously learned how to make interactive maps. You can find the class material which has interactive maps in our lecture notes from 12-Feb-2025.
plotly
The plotly package allows you to easily create a wide variety of interactive graphics in R. Plotly is a very well developed package with a large user base and many detailed examples available on the web. A great feature of plotly is that in addition to using the built-in plotting features, you can also convert ggplot2 graphics into interactive ones – thus allowing you to create interactive graphics without having to learn/master a new package.
Let’s load in the plotly package (if you haven’t yet installed it, go to you package pane and do so first).
library(plotly)
Let’s load NOAA monthly temperature data for the each US state from 1895 through 2024.
state_temps <- read_csv("https://stahlm.github.io/ENS_215/Data/noaa_cag_state_temperatures.csv")
state_temps <- state_temps %>%
rename(Avg_Temp_F = Value)
Now let’s use ggplot()
to create a boxplot summarizing
monthly temperatures in the state of California.
fig_1<- state_temps %>%
filter(STATE == "California") %>%
ggplot(aes(x = factor(MONTH), y = Avg_Temp_F)) +
geom_boxplot() +
theme_classic() +
labs(title = "CA Monthly temperatures",
x = "Month",
y = "Temperature (F)")
fig_1
We have a nice static graphic here, but it would be great to have an interactive version to allow us to dynamically explore this data.
We can easily create an interactive graphic by passing our
ggplot2
figure object fig_1
to the function
ggploty()
from the plotly
package.
This will now convert your static ggplot2
graphic to an
interactive graphic. When you run this code, the interactive graphic
will appear in your viewer pane. You can view and interact with the
graphic in the viewer pane, however I reccomend that you show the
graphic in a new window by clicking the window with an arrow icon at the
top toolbar of your viewer pane. This opens the graphic in your web
browser and makes it much easier to interact with.
When you create the graphic spend a few minutes exploring and learning how to use the plotly interface. You will notice that in your plotly window there is a toolbar that has some useful functionality.
ggplotly(fig_1)
For our first example we converted a ggplot2 boxplot to an
interactive graphic using the plotly ggplotly()
function.
We can similarly apply this function to convert any other ggplot2
graphic to an interactive version.
Let’s load in the atmospheric CO2 data from Mauna Loa to use in the following example.
mauna_loa <- read_csv("https://stahlm.github.io/ENS_215/Data/Mauna_loa_CO2_data.csv", skip = 2)
First we will use ggplot()
to generate a static figure
showing the monthly CO2 concentrations from 2010 to 2018,
where each year is its own line.
fig_mauna_loa <- mauna_loa %>%
filter(Year >= 2010, Year < 2025) %>%
ggplot(aes(x = Month, y = CO2_ppm, group = Year, color = Year)) +
geom_line() +
scale_color_gradient(low = "blue", high = "red") +
theme_classic() +
labs(title = "Atmospheric CO2",
subtitle = "Measured at Mauna Loa, Hawaii",
x = "Month",
y = "CO2 (ppm)",
caption = "Data source: NOAA/ESRL") +
scale_x_continuous(breaks = seq(1:12))
fig_mauna_loa
Now let’s generate an interactive graphic using
ggplotly()
ggplotly(fig_mauna_loa)
As you can see we can easily convert any of our ggplot2 graphics to an interactive version.
Let’s see another example to highlight a few additional features of plotly. In this example we’ll use the gapminder data.
library(gapminder)
my_gap <- gapminder
fig_gap <- my_gap %>%
ggplot(aes(x = gdpPercap, y = lifeExp,
fill = continent, group = country)) +
geom_point(shape = 21, alpha = 0.75, color = "black", size = 2) +
scale_x_log10() +
theme_classic()
fig_gap
Now let’s use ggplotly()
to create an interactive
version of the above scatter plot.
Note, that we are specifying tooltip = ...
in our
ggplotly()
function call. The tooltip
argument
allows us to control the information that is displayed when we place our
mouse cursor over a data point. You can specify the variables that you
would like to display. When specifying the variable you can either use
its name or the aesthetic it maps to (e.g. color, x, y, fill, …).
ggplotly(fig_gap, tooltip = c("group","x","y"))
Another important feature of plotly is the ability to turn on and off plotted layers. If you click on the continent items in the legend of you plotly graphic, this will allow you to toggle that layer on or off.
For our final plotly example we will use the BGS Bangladesh groundwater chemistry data.
bangladesh_gw <- read_csv("https://stahlm.github.io/ENS_215/Data/NationalSurveyData_DPHE_BGS_LabData.csv")
For this example we will directly create our graphic using the
plot_ly()
function as opposed to creating a
ggplot2
graphic and then converting it to a plotly
object.
Note that when mapping a variable to an aesthetic you use
= ~
in plot_ly()
. The figure below will plot
each of the groundwater samples in 3-dimensions, where the x and y
coordinates are the longitute and latitude and the z coordinate is the
well’s depth. This will allow us to “see” into the subsurface and view
all of the samples. We are also color coloring the well by it’s
log10 arsenic concentration.
color_ramp <- colorRamp(c("blue", "yellow", "red"))
plot_ly(bangladesh_gw, x = ~LONG_DEG, y = ~LAT_DEG, z = ~-WELL_DEPTH_m, name = ~DIVISION) %>%
add_markers(color = ~log10(As_ugL), colors = color_ramp, text = ~paste("As =", As_ugL))
Take a minute or two to explore the 3D graphic.
dygraph
It is often very helpful to add interactivity to time-series data. Often we would like to zoom in on a particular time period within a longer times-series.
The dygraphs package provides excellent functionality for creating interactive time-series graphics.
Let’s first load in the dygraphs
package and the
xts
package, which we will also use along with dygraphs
(you will probably need to first install these packages).
library(dygraphs)
library(xts)
Now let’s return our focus to the streamflow data from the USGS streamgage at Schoharie Creek (01351500). This will give us a nice time-series dataset to work with in dygraphs.
It is important to point out that dygraphs require your data to be an
xts object. The xts format is an effective way of
storing time-series data. It is easy to convert you standard R date
object into an xts object using the xts()
function from the xts
package.
Let’s now convert our flow
data frame into an xts
object. You can see that the syntax is
xts(data_to_convert, order.by = dates)
. Thus we use the
Date column from flow
to order the flow_cfs
column from flow
when performing the conversion to xts.
flow_ts <- xts(flow[,"flow_cfs"], order.by = flow$Date)
Now that we have an xts object flow_ts
, we are ready to
use the dygraph()
function to plot our time series data.
Just like with the plotly graphics, it is easier to view dygraph
graphics in your browser (click the icon to pop out the graphic to a new
window).
dygraph(flow_ts)
You’ll see that you can zoom by selecting a region of the time-series. Also note that at the top of the graphic there is text that displays the date and y-value of the data corresponding to your cursor location. Take a few minutes to explore the graphic and learn about the dyrgraph functionality.
Let’s create another dygraph using the same data, but this time let’s
add additional formatting and features. You can see in the code below,
that we updated the series info with the dySeries()
function. The y-axis label (ylab
) and used
label =
to update the name of the series from
"flow_cfs"
to "Schoharie Creek"
. We also added
a range selector bar to the bottom of the figure using the
dyRangeSelector()
. We made a few other formatting changes
as well – you can learn more about additional dygraph features here.
dygraph(flow_ts, ylab = "Flow (cfs)") %>%
dySeries("flow_cfs", label = "Schoharie Creek") %>%
dyRangeSelector(height = 50) %>%
dyHighlight(highlightCircleSize = 5,
highlightSeriesBackgroundAlpha = 0.5,
hideOnMouseOut = FALSE) %>%
dyOptions(drawGrid = FALSE)
gganimate
Animations are another way of presenting data in a more dynamic
fashion. We can create animations in R using the gganimate
package. We will go through a few basic examples, though you can learn
more here.
Let’s first load in the gganimate
package (you will
probably need to first install this package). You will also need the
gifski
package, so you should install that package and load
it in as well.
library(gganimate)
library(gifski)
The gganimate
package allows you to take your
ggplot2
graphic and create an animation by simply breaking
your graphic up into frames that progress through time (or some other
variable that defines breaks in the data).
First let’s generate a static graphic showing the atmospheric CO2 concentration for each year since 1980, where there each year is represented by its own line.
fig_mauna_loa <- mauna_loa %>%
filter(Year >= 1980) %>%
ggplot(aes(x = Month, y = CO2_ppm, group = Year, color = Year)) +
geom_line() +
scale_color_gradient(low = "blue", high = "red") +
theme_classic() +
labs(title = "Atmospheric CO2",
subtitle = "Year: {frame_along}",
x = "Month",
y = "CO2 (ppm)",
caption = "Data source: NOAA/ESRL") +
scale_x_continuous(breaks = seq(1:12))
fig_mauna_loa
It would be interesting and informative to create an animation of the
above graphic where the lines appear one by one, in order of their year.
All this requires is a single function from gganimate
. In
the example below we use the transition_reveal()
function
and specify that the variable to use when splitting the graphic into
frames is the Year
variable from the mauna_loa
data frame.
When you run the code below the animation will be generated. Note that this may take a minute or so. Similar to before, you can pop the graphic out to a new window to allow for easier viewing.
Let’s create essentially the same animation as above, though this time let’s show all the years from 1958 onwards. Let’s also save the animation as a gif so that you have a permanent copy that can be viewed and shared.
mauna_loa %>%
filter(Year >= 1958) %>%
mutate(Year = round(Year,2)) %>% # round the years to the 2nd decimal place (makes dynamic plot title nicer)
ggplot(aes(x = Month, y = CO2_ppm, group = Year, color = Year)) +
geom_line() +
scale_color_gradient(low = "blue", high = "red") +
theme_classic() +
labs(title = "Atmospheric CO2",
subtitle = "Year: {frame_along}",
x = "Month",
y = "CO2 (ppm)",
caption = "Data source: NOAA/ESRL") +
scale_x_continuous(breaks = seq(1:12)) +
transition_reveal(Year) # create the animation
anim_save("./Mauna_Loa_seasonal.gif") # save the animation to your current directory
As our final example, let’s create an animation of the famous “hockey stick” graphic of atmospheric CO2 concentrations.
fig_mauna_loa_ts <- mauna_loa %>%
ggplot(aes(x = make_date(Year, Month, 15), y = CO2_ppm)) +
geom_line(size = 1) +
theme_bw() +
labs(title = expression("Atmospheric CO"[2]),
subtitle = "Measured at Mauna Loa, Hawaii",
x = "",
y = expression("CO"[2]* " (ppm)"),
caption = "Data source: NOAA/ESRL")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fig_mauna_loa_ts
Now let’s use gganimate
to create an animation. We will
use transition_reveal()
to have the line appear on monthly
time-steps (i.e. in each frame the next month of data will be
added).
fig_mauna_loa_ts + transition_reveal(make_date(Year, Month, 15)) # create the animation
anim_save("./Mauna_Loa_ts.gif") # save the animation to your current directory
There are many more features in plotly
,
dygraphs
, and gganimate
. I encourage you to
explore some more of these features/functionality at the following
links:
If you have additional time today, try making some more graphics that implement some of these additional features. Also think of ways that you might integrate these graphics into your final projects.