When working with data you will commonly need to deal with dates. We’ve already dealt with dates in this class, however for the most part the I’ve done some preliminary data manipulation to get the dates into an easy format for you to deal with (e.g. separate columns of year, month, and day).

While this approach made the data easier to deal with for learning purposes, it also imposed some limitations. Furthermore, most datasets you encounter will not have had this preliminary manipulation done.

Given the importance of dates in data analysis, we will learn some techniques for handling dates in R. One of the challenges posed by dates is that you will often encounter them in a wide variety of formats, including those that mix both text and numbers. For instance consider the many ways that you could represent February 20th, 2019

  • Feb 20th, 2019
  • 20-Feb-2019
  • 2019_02_20
  • 20190220
  • 2/20/2019
  • 20/2/2019
  • 2019/20/2
  • 2019/2/20
  • 2019/02/20

The above list is by no means exhaustive, furthermore we could add the time of day to the date as well, which would add additional formats to the above list.

To allow for dates to be properly interpreted by R, we need to convert the date from its initial format into an R date object. R has built in functionality to do this, though the package lubridate provides additional functionality and ease of handling for dates in R.


Creating date/time objects

Let’s first load in the package lubridate and also tidyverse. You will already have lubridate on your computer since it is included in the tidyverse packages. However, library(tidyverse) only loads in the core tidyverse packages, so you’ll still need to type library(lubridate) to load in `lubridate.

library(tidyverse)
library(lubridate)


Dates from strings

Date/time info can be stored as a date, time within a day (e.g. 15:45 EST) or date-time which has both the date and time information.

When creating dates or date-times, you will typically be starting from one of the following

  • Strings containing the dates/times
  • Numeric components (e.g. numeric columns of year, month, day)
  • Existing date/time object


Let’s first take a look at dealing with dates in string format

dates_string <- c("2019-2-20", "2019-2-21", "2019-2-22")


Let’s check the class of date_string

class(dates_string)
## [1] "character"


You can see that is it a character at the moment. To get this into a date format we’ll rely on the ymd() function in lubridate. The ymd() function converts text in year-month-day format into a date object

dates_object <- ymd(dates_string)

class(dates_object)
## [1] "Date"

You can see that R now recognizes the data as a date object (the date has been converted from character to a Date object)

Note that the lubridate is smart and will easily convert dates that use other separators (e.g. / or . instead of -)

dates_string <- c("2019/2/20", "2019/2/21", "2019/2/22")
class(dates_string)
## [1] "character"
dates_object <- ymd(dates_string)
class(dates_object)
## [1] "Date"


  • Create a vector with dates as strings, though use periods . to separate the years.months.days. Also “pad” the months and days with zeros so that they are always two digits long (e.g. 02 for February instead of 2). Then use the ymd() function to convert your character vector to a vector of dates
# Your code here


So we’ve seen that we can convert dates in year-month-day format using the ymd() function. Lubridate has additional functions to parse other date configurations

Function Date format
ymd() year/month/day
ydm() year/day/month
mdy() month/day/year
myd() month/year/day
dmy() day/month/year
dym() day/year/month


These lubridate functions are smart and will properly parse date strings that present the date in different styles – the key thing to ensure is that you specify the function that corresponds to the ordering of the date components.

ymd("2018 Mar 31")
## [1] "2018-03-31"
mdy("July 4th 2016")
## [1] "2016-07-04"
dmy("10-Feb-2020")
## [1] "2020-02-10"


  • Test out some more of the lubridate functions on dates of varying styles. Try to test out a bunch of different styles to convince yourself of the flexibility of the lubridate functions.
# Your code here


When your data is in date-time format (i.e. has the date and time specified) then to parse the data into an R date-time format you simply append _hms to the applicable lubridate funcition. Let’s illustrate this with an example.

my_datetimes_string <- c("2019/02/22 10:30:00", "2019/10/15 13:45:10")
my_datetimes_string
## [1] "2019/02/22 10:30:00" "2019/10/15 13:45:10"
my_datetimes <- ymd_hms(my_datetimes_string)
my_datetimes
## [1] "2019-02-22 10:30:00 UTC" "2019-10-15 13:45:10 UTC"


You can see that the strings were properly parsed into dates, though note that the time-zone defaults to “UTC”. You can specify the time-zone of interest as follows

my_datetimes <- ymd_hms(my_datetimes_string, tz = "EST")
my_datetimes
## [1] "2019-02-22 10:30:00 EST" "2019-10-15 13:45:10 EST"


Let’s also take a quick look at the class of the my_datetimes object

class(my_datetimes)
## [1] "POSIXct" "POSIXt"

We can see that the class is “POSIXct”, which is the R class for date-time objects

You can also create date-time objects with _h, _hm where the date-time objects would have hours or hours and minutes specified respectively

ymd_h("2019 Feb 3rd 15", tz = "EST")
## [1] "2019-02-03 15:00:00 EST"
ymd_hm("2019 Feb 3rd 10:45", tz = "EST")
## [1] "2019-02-03 10:45:00 EST"


From a dates individual components

You will commonly come across data that has date information stored across columns - where each column holds part of the date (or date-time) information. For examply you may have three columns with the year in one column, the month in another, and the day in yet another column. While there are certain cases, when this is helpful, oftentimes you would like to have a single column that contains all of the date information - this is particularly useful when creating time-series graphics.

The lubridate package has the functions make_date() and make_datetime(), which does this operation.


First, let’s load in USGS daily streamflow data for the Hudson River (USGS gage 01335754 above Lock 1 near Waterford, NY). This data that has the date information stored across columns.

flow <- read_csv("https://stahlm.github.io/ENS_215/Data/USGS_gage_01335754.csv") %>% 
  drop_na()


Take a look at the dataset and you’ll see that there is a Year, Month, and Day column. Let’s create a new column called Dates that has all of the date information (stored as a date object) in a single column. We’ll use make_date() to do this.

flow <- flow %>% 
  mutate(Date = make_date(Year, Month, Day))

Now we’ve got the date information in a single column. If you also had hour, minute, and seconds information then you could have used the make_datetime() function.


Since we now have a single column that contains all of the date information we can get rid of the Year, Month, and Day columns as they are now redundant

flow <- flow %>% 
  select(Date, flow_cfs)

Now we have a more compact dataframe that still contains all of the original information

head(flow)

Having a single column with the date information is incredibly useful as we can not create a time-series plot of the daily streamflow data

FYI, you can get the current date or current date-time by using the today() and now() functions from the lubridate package.

today()
## [1] "2019-02-24"
now()
## [1] "2019-02-24 12:03:01 EST"


Exercises

  1. Create a line plot of the daily streamflow data
    1. Create a version where the y-axis is in linear (default) scale
    2. Create a version where the y-axis in in log10 scale


Extracting components from date-time objects

There are many situations where you will need to extract the individual components from a date. The lubridate package has a number of functions that perform these operations.

To demonstrate this, let’s create a date-time object and then extract its components

First we’ll create the date-time object using the current date and time

datetime_test <- ymd_hms(now(), tz = "EST")
datetime_test
## [1] "2019-02-24 12:03:02 EST"


Now let’s extract the components. The functions year(), month() will extract the year and month respectively. When extracting days we can use a number of functions – mday() will get the day of the month, yday() will get the day of the year (i.e. days from the Jan 1st), wday() will get the day of the week.

Let’s test out these functions on our datetime_test object

year(datetime_test) # get the year
## [1] 2019
month(datetime_test) # get the month
## [1] 2
mday(datetime_test) # get the day of month
## [1] 24
yday(datetime_test) # get the day of year
## [1] 55
wday(datetime_test) # get the day of week
## [1] 1
hour(datetime_test) # get the hour
## [1] 12
minute(datetime_test) # get the minute
## [1] 3
second(datetime_test) # get the second
## [1] 2
decimal_date(datetime_test) # get the date in decimal year format.  For instance if you are halfway through 2019 then the date would be 2019.50
## [1] 2019.149


For months and days, you can have the functions output the results as a text string containing the abbreviation for the month or day. To do this we simply specify label = TRUE in the wday() or month() function

month(datetime_test, label = TRUE)
## [1] Feb
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(datetime_test, label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat


Extracting the date-time components is often extremely helpful. For instance if you want to filter a dataset by year and you can use the year() function to get the year values from a single date variable (column) without the need to create a column with the year information.

For example, consider the flow data that we’ve loaded in today. We have a single column with all of the date information. To filter the dataframe by year we can use the year() function in our filter statement.

flow %>% 
  filter(year(Date) == 2017)


Exercises

  1. Use the make_datetime() function to create a single column DateTime in the flow dataset that has all of the date and time info stored as a date-time object. You can set the values to 12, 00, and 00 respectively for hours, minutes, and seconds.
flow<- flow %>% 
  mutate(DateTime = make_datetime(year(Date), month(Date), mday(Date), 12, 00, 00))
  1. Create a string that prints “Today is Fri, Feb 22 and the time is 15:30”. Though have the string print out the information corresponding to the present date and time. You will need to use the paste() to concatenate strings (we learned this earlier in the term).

  2. Using the flow data create a summary table with the mean flow for each month (i.e twelve rows, one for each month)

  3. Using the flow data, create time-series plots of flow vs. time for each year from 2010 onwards. You should use facet_wrap() to create all of the plots at once. On the x-axis you should have the day of the year.
    • Do you see any years where there seem to have been extremely high flows? Any idea what might have caused these high flows in the Hudson River?


Math with dates

There are many situations where you will need to perform mathematical operations (e.g. addition, subtraction, division,…) on dates. Using the base R functionality along with the lubridate package you can carry out these types of operations.

Before moving forward it is helpful to be aware of the three ways time spans are represented in R. The definitions of the three ways time spans are represented are nicely defined in R4DS and are reproduced below

  • durations, which represent an exact number of seconds.
  • periods, which represent human units like weeks and months.
  • intervals, which represent a starting and ending point.

Durations

Durations are stored as seconds, since this is the only unit of time with a consistent length. For instance, you could not use units of years to store durations as years can have different lengths (e.g. a leap-year vs. a regular year).

Durations are important to use when you are representing physical processes. For instance, imagine that you have temperature sensor in the ocean and it is powered by a battery. If the battery life remaining was reported in years, there would be some ambiguity (as years can have different lengths), whereas reporting the life remaining in seconds leaves no ambiguity.

You can create durations using lubridate by using functions such as dyears() and dweeks(). The syntax is straightforward – simply append d to the time duration of interest to create the function. When creating durations, the lubridate functions operate with the following settings (60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, 365 day in a year).


Let’s create some durations to highlight how they work

dyears(1) # duration of 1 year
## [1] "31536000s (~52.14 weeks)"
dyears(2) # duration of 2 years
## [1] "63072000s (~2 years)"
dweeks(75) # duration of 75 weeks
## [1] "45360000s (~1.44 years)"
ddays(10) # duration of 10 days
## [1] "864000s (~1.43 weeks)"
dminutes(15) # duration of 15 minutes
## [1] "900s (~15 minutes)"


You can add, multiply, and divide durations

dyears(1) + dweeks(10) + ddays(25)
## [1] "39744000s (~1.26 years)"
5 * dyears(1)
## [1] "157680000s (~5 years)"
dyears(20)/10
## [1] "63072000s (~2 years)"


You can also add or subtraction durations from dates

right_now <- now() # date-time now
right_now
## [1] "2019-02-24 12:03:02 EST"
right_now - dyears(10) # date-time 10 duration years prior to now
## [1] "2009-02-26 12:03:02 EST"


You’ll notice that the result is not today’s date, ten years ago. This is because durations are not calendar (“human”) units, but are instead an exact number of seconds (i.e. 31536000 seconds for a year duration). Thus using durations may results in results that are unexpected.

Our result above is the date and time that is exactly 10*(3,153,600 seconds) before the moment the code was run.

Durations are useful for determining the time elapsed between two date-times. For example you might want to know exactly how much time has elapsed since Union College was established. First let’s subtract the current date-time from the date when Union College was established (we’ll assume the time of establishment was 12 noon)

Union_time <- now() - ymd_hm("1795, Feb 25th 12:00")
Union_time
## Time difference of 81813.21 days

Subtracting two dates gives you a difftime object in R and this object records time spans using seconds, minutes, hours, days, or weeks (depending on the situation) and thus can create ambiguity. To remove ambiguity we can convert the difftime object to a duration (which always uses seconds) using the as.duration() function

as.duration(Union_time)
## [1] "7068661382.75484s (~223.99 years)"


Periods

While durations are very useful when dealing with physical processes where you care about the true time elapsed, there are many situations where we are interested in dealing with calendar (“human”) times.

In these situations we can use periods in lubridate. Periods do not have a fixed length in seconds (whereas durations do), and instead they work with “human” time-spans such as calendar days, months, years.

The functions for periods is similar to those for durations with the only difference being we drop the d at the start of the function name. Thus, for example the functions years(), months(), and days() represent periods of years, months, and days respectively.

Let’s demonstrate how periods in lubridate work.

right_now - years(10)
## [1] "2009-02-24 12:03:02 EST"

You can see that subtracting a period of 10 years from the right_now date-time, gives us a result that is exactly 10 calendar years prior to today (i.e. the exact same time and date, but 10 years earlier).


Just like with durations you can add, subtract, multiply, and divide periods.

months(2) + days(5) # addition
## [1] "2m 5d 0H 0M 0S"
5 * (months(2) + days(5)) # addition and multiplication
## [1] "10m 25d 0H 0M 0S"


Formatting date output

In many cases you will want to output a date-time object as a character string – for instance to annotate a graphic or include in the text of a report. To do this you can you the as.character() function to convert the date-time object to a character. You can specify the format of the output character using the format = argument in the function and specifying one of the many format options listed below.

Format codes Description Example
%a Weekday name (abbreviated) Sun, Tue, Tue
%A Full weekday name Sunday, Tuesday, Tuesday
%m Month as decimal number 06, 07, 08
%b Month name (abbreviated) Jun, Jul, Aug
%B Full month name June, July, August
%c Date and time, locale-specific Tue Aug 12 18:01:59 2014
%d Day of the month as decimal number 23, 30, 12
%H Hours as decimal number (00 to 23) 03, 14, 18
%I Hours as decimal number (01 to 12) 03, 02, 06
%p AM/PM indicator in the locale AM, PM, PM
%j Day of year as decimal number 174, 211, 224
%M Minute as decimal number (00 to 59) 45, 23, 01
%S Second as decimal number 23, 00, 59
%U Week of the year starting on the first Sunday 25, 30, 32
%W Week of the year starting on the first Monday 24, 30, 32
%w Weekday as decimal number (Sunday = 0) 0, 2, 2
%x Date (locale-specific) 8/12/2014
%X Time (locale-specific) 2:23:00 PM, 6:01:59 PM
%Y 4-digit year 2013, 2013, 2014
%y 2-digit year 13, 13, 14
%Z Abbreviated time zone EST, MST
%z Time zone -0500, -0700


Let’s test this out.

right_now <- now()

as.character(right_now, format = "%c") # Date and time, locale-specific
## [1] "Sun Feb 24 12:03:02 2019"
as.character(right_now, format = "%B") # Full month name
## [1] "February"
as.character(right_now, format = "%A %B %d") # Full weekday name, full month name, day of month
## [1] "Sunday February 24"
  • Try out some additional format options
  • There are lot’s of additional features/functionality in the lubridate package. Look at your lubridate cheat sheet and practice with some additional functions that seem useful to you but we didn’t cover today. A nice place to start might be the intervals functionality.