As you’ve already learned in today’s lecture one of the best tools you have to interpreting/understanding your data are your own eyes! A visual representation of your data can reveal patterns and interesting features that would have been difficult or impossible to identify by looking at a data table.
Plots are not only the most effective way for you to understand your data, they are also the most effective way for you to convey your message to an audience.
For a more in-depth look into the motivations and goals of data visualization, refer back to today’s lecture slides.
Before jumping into the ggplot2
package, let’s become familiar with some of the most common types of graphics that you will make and encounter in you work.
Scatter plots useful for displaying the relationship between two variables (e.g. height vs. weight). If one variable is the independent (controlling variable) it is generally plotted along the x-axis, while the dependent variable is on the y-axis. For example if you were looking out how precipitation affects streamflow you would plot precipitation on the x-axis and streamflow on the y-axis. A third variable can be displayed by scaling or color coding the plotted symbols.
Line graph are useful for displaying the relationship between two variables, where each x-value corresponds to a single y-value. Line graphs are typically used to display time-series data (e.g. streamflow vs. time, temperature vs. time) where there is a single measurement at each time-point.
Column and bar charts useful for displaying number of items or values in different classes/groups, where the height of the bar represents the value. For instance, you could display the water usage by US state using a bar chart, where each state would have its own bar and the height of the bar would be proportional to the amount of water used in that state.
Histograms are similar to a bar chart, but they are used to display the frequency distribution of the data. For example you could display temperature data for a location using a histogram to understand how the data is distributed.
Rose diagrams similar to a histogram but used for displaying data that has a directional component (e.g. wind).
Pie charts useful for displaying proportions of a whole. For instance you might display the land cover (e.g. forested, developed, grasslands,…) of a given region using a pie chart. Stacked bar charts can also be used to display this type of information and they are generally easier to read and more compact than pie charts.
Ternary diagrams Used to show proportions when the three components sum to 100%. Commonly used to display grain size information.
Box plots also called box and whisker plots, they are useful at displaying the statistical distribution of categorical data.
Contour and surface plots are used to display three variables. Typically the x and y variable is the spatial position and the color (or contoured) variable is some value that varies in space (e.g. concentration).
Maps are excellent at displaying spatial data. There are tons of different styles and types of maps and this is a whole subject on its own. We will cover some basic mapping techniques in R later in the term.
You can see a bunch of great examples of plot/graphic types (which are all available in R) here
ggplot2
packageThe ggplot2
package designed around the idea that a graphic can be decomposed into its fundamental parts and thus we can build them much like a sentence, by combining these parts according to grammatical rules.
IMPORTANT CLARIFICATION: the package is called ggplot2
however when you create a figure you use the ggplot()
(note that there is no 2
in the function call). Also note that the ggplot2
library is loaded in when you load in tidyverse
. If you want to load in ggplot2
by itself, you can simply type library(ggplot2)
.
Recall that when construction a graphin in ggplot2
the three essential components of a graphic are:
data
: dataset containing the mapped variablesgeom
: geometric object that the data is mapped to (e.g. point, lines, bars, …)aes
: aesthetic attributes of the geometric object. The aesthetics control how the data variables are mapped to the geometric objects (e.g. x/y position, size, shape, color, …)The basic template for creating a graphic in ggplot2
is
ggplot(data = DATASET) + GEOM_FUNCTION(mapping = aes(MAPPINGS) )
geom
function you want to use (e.g. geom_point()
)geom
(e.g. x = gdpPercap, y = lifeExp, color = continent
)You can also set the aes
thetic to be global (i.e. will apply to all of the geoms
associated with that ggplot
call) by defining the aes
right in the ggplot
call. For instance
ggplot(data = DATASET, mapping = aes(MAPPINGS)) + GEOM_FUNCTION()
Search online for an interesting data visualization example and using the grammar of graphics framework identify the
geom
etriesaes
thesticsSome good places to look for good examples are:
With your neighbor(s) share and discuss the graphics you’ve chosen. Think about what makes the graphic “work”.
Think about how you might construct this graphic in R. Think about additional graphical approaches to presenting the same data.