Let’s first load in the packages we need for today’s work. We will load in tidyverse
(which includes ggplot2
) and gapminder
so that we have a nice dataset to work with.
library(tidyverse)
library(gapminder)
my_gap <- gapminder # create your own data object with the gapminder data
Let’s plot per capita GDP versus country in Asia for year 2007. We’ll also save our figure to an object fig_1
. You’ll see that this object now appears in your environment pane. It is often useful to save a graphic to an object so that we can use it later in our code/work (make changes and/or modified versions of the original graphic).
fig_1 <- my_gap %>%
filter(year == 2007, continent == "Asia") %>%
ggplot(aes(x = reorder(country, gdpPercap), y = gdpPercap)) +
geom_point() +
theme_classic()
fig_1
You can see that the tick labels on the x-axis are overlapping and cluttered. We can fix this.
Let’s adjust the angle
of the text as well as the horizontal justification (hjust
). hjust = 1
makes the text right-justified and hjust = 0
makes the text left-justified.
fig_1 <- fig_1 +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
fig_1
Now this looks much better. With the tick labels at an angle they no longer overlap one another.
Changing the number formats of the tick labels often helps to make the graphic more readable and attractive. We can use the functionality in the scales
package to make these changes.
library(scales)
Look back to fig_1
which you created in the code above. The GDP values would be much easier to read if they had commas separating the digits. We can adjust the labels by specifying comma_format()
as the label type. We do this in the scale_y_continuous()
function, which is the function controlling the styling of the y axis in this graphic.
fig_1 +
scale_y_continuous(labels=comma_format())
Now the plot looks much nicer. Note, if we wanted to change the x-axis we would have used scale_x_continuous()
.
We we have log scaling on our axis, we use scale_x_log10()
or scale_y_log10()
to adjust the axis features. Let’s generate a plot in log scale first.
fig_life <- ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
theme_classic() +
scale_x_log10()
fig_life
We’ve got a graphic with the x-axis in log10 scale. It looks pretty good, but the tick labels are in scientific notation. This is a nice compact way to label the ticks, but it is not very attractive or readable in the current format. Instead of having for instance 1e+04
it would look much nicer to have 104. We can do this using the following code.
fig_life +
scale_x_log10(labels = trans_format('log10', math_format(10^.x)) )
Scale for 'x' is already present. Adding another scale for 'x', which will
replace the existing scale.
Note that we specify the label = trans_format
. This indicates that we are will format our labels using a transformed (mathematically) version of our variable (in this case our x variable).
In some cases, it is useful to specify the locations where you would like tick marks and labels. We can do this by passing the breaks
argument to our scale
function. We specify the desired break locations using a vector.
fig_life +
scale_x_log10(breaks = c(250, 1000, 5000, 25000, 50000, 90000))
Scale for 'x' is already present. Adding another scale for 'x', which will
replace the existing scale.
Last class we saw how to add titles, axes labels, and captions using the labs()
function.
The example below shows how we create these labels with the labs()
function.
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
labs(title = "Life expectancy increases with income",
subtitle = "Life expectancy vs. GDP per capita",
x = "GDP per capita",
y = "Life Expectancy",
caption = "Data source: gapminder") +
theme_classic()
The labs function generates labels using default settings for font style, size, and color. The default settings are nicely chosen, however in many situations you may want to adjust these to meet your needs.
Once the labels are specified with the labs()
function, we can adjust the label appearence with the theme()
function. The theme()
function actually allows us to adjust many features of a graphic beyond just the labels, however we will first learn how to use this function with respect to label formatting.
To adjust the text appearence, we will call the element_text()
function within theme()
.
The element_text()
function accepts a number of arguments (inputs), including: + color
, size
, face
, family
: which adjust the font color, size, face (“plain”, “bold”, “italic”, “bold.italic”), and family (“sans”, “serif”, “mono”, “symbols”) respectively.
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
labs(title = "Life expectancy increases with income",
subtitle = "Life expectancy vs. GDP per capita",
x = "GDP per capita",
y = "Life Expectancy",
caption = "Data source: gapminder") +
theme_classic() +
theme(plot.title = element_text(color = "blue", size = 14, face = "bold"),
plot.subtitle = element_text(color = "blue", size = 11),
plot.caption = element_text(face = "italic"),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold")
)
You can see that adjusting properties within theme()
gives us a lot of control over the appearence of the text/labels (as well as other graphic features). However, the code can quickly become long and somewhat cumbersome. To make the coding easier, recall that you can save a graphic to an object
and then add to this object. This allows you to break up your figure tweaking into several steps and/or code blocks and thus makes it easier to read.
Also note how I specified theme_classic()
and then specified theme()
. This first set the graphics properties to those in theme_classic()
and then I modified a few of those theme parameters based on my desired outcome.
Specifying the color scheme used in your graphic can greatly improve its readability and appearance. There are a number of available color schemes in ggplot2
that you can specify and they will choose colors from a nice set of agreable colors. The table below lists the available color schemes for coloring continuous and categorical (discrete) variables.
Continuous | Categorical |
---|---|
scale_colour_gradient |
scale_colour_hue |
scale_colour_gradient2 |
scale_colour_grey |
scale_color_distiller |
scale_colour_manual |
scale_fill_gradient2 |
scale_colour_brewer |
scale_fill_gradient |
|
scale_fill_distiller |
|
Let’s make a graphic with GDP per capita on the x-axis and country on the y-axis, for the countries in the Americas for year 2007.
Let’s color the points by its life expectancy. Since life expectancy is a continuous variable, we’ll choose from one of the available schemes for continuous variables.
my_gap %>%
filter(continent == "Americas" , year == 2007) %>%
ggplot(aes(y = reorder(country, gdpPercap), x = gdpPercap, color = lifeExp)) +
geom_point(size = 3) +
scale_color_gradient(low = "red", high = "green") +
theme_classic()
In the above color scheme, we are able to define the low
and high
color.
The scale_color_gradient2()
allows you to declare a low
, mid
and high
color. You must also declare the value you would like to use to delineate the midpoint
of the color scheme.
my_gap %>%
filter(continent == "Americas" , year == 2007) %>%
ggplot(aes(y = reorder(country, gdpPercap), x = gdpPercap, color = lifeExp)) +
geom_point(size = 3) +
scale_color_gradient2(low = "red", mid = "green", high = "blue", midpoint = 70)
Note that the schemes with fill
in the name only work with symbols that accept a fill.
Now let’s check out the categorical color schemes by creating a graphic of GDP vs. life expectancy, where the points are color coded by continent. The scale_color_brewer()
has a set of nice pre-defined color palettes. You can look at the help file to learn the available options.
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
scale_x_log10() +
scale_color_brewer(palette = "Dark2")
You can also define your own color scheme using the scale_color_manual()
function.
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
scale_x_log10() +
scale_color_manual(values = c("red", "blue", "green", "gray", "purple"))
When specifying a color you can use the color’s name (e.g. color = "blue"
). R has a large hundreds of colors that you can select by name. The document linked here lists these available colors.
ggplot
themesThe ggplot2
package comes with a number of built-in themes for setting the look/appearence of a graphic. The built-in themes are:
Themes |
---|
theme_gray |
theme_bw |
theme_linedraw |
theme_light |
theme_dark |
theme_minimal |
theme_classic |
Let’s create a graphic of GDP vs. life expectancy and set the theme to theme_classic()
.
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
theme_classic()
The built-in themes can help provide a nice base template for your figure appearence. Once you’ve specified a theme (e.g. theme_classic()
) You can then modify or override certain settings by adding a theme()
function call where you adjust the desired settings.
The ggthemes
package has additional themese that you can use with your graphics. If you haven’t yet installed ggthemes
go to your package window and do so. Once you’ve installed it, you should then load in the package.
library(ggthemes)
Themes in the ggthemes
package include:
Themes |
---|
theme_wsj |
theme_economist |
theme_economist_white |
theme_fivethirtyeight |
theme_excel_new |
theme_tufte |
Let’s create a graphic of GDP vs. life expectancy and set the theme to theme_economist_white
, which mimics the graphic style used in the magazine The Economist
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
labs(title = "Life expectancy increases with income",
subtitle = "Life expectancy vs. GDP per capita",
x = "GDP per capita",
y = "Life Expectancy",
caption = "Data source: gapminder") +
theme_economist_white()
ggthemes
packageIn some cases, you will want to adjust the aspect ratio of your graphics. This is particularly desirable, when your x and y axes have the same units and you would like the scaling to reflect their relative ranges. Let’s go back to a graphic that we created last class, where we examined how each countries life expectancy in 2007 compares with its life expectancy in 1952.
First we are going to create a dataframe that has the life expectancy in year 1952 and year 2007 for each country.
life_exp_table <- my_gap %>%
filter(year %in% c(1952,2007)) %>%
group_by(country) %>%
arrange(country, year) %>%
summarize(continent = first(continent), life_1952 = first(lifeExp), life_2007 = last(lifeExp))
life_exp_table
country | continent | life_1952 | life_2007 |
---|---|---|---|
Afghanistan | Asia | 28.801 | 43.828 |
Albania | Europe | 55.230 | 76.423 |
Algeria | Africa | 43.077 | 72.301 |
Angola | Africa | 30.015 | 42.731 |
Argentina | Americas | 62.485 | 75.320 |
Australia | Oceania | 69.120 | 81.235 |
Now, let’s make the graphic
ggplot(life_exp_table, aes(x = life_1952, y = life_2007)) +
geom_point()
The above graphic has axes units (length) that differ. For instance 10 years on the x-axis might be equal to 1 inch length and 10 years on the y-axis might be equal to 0.5 inches.
To make the axes units equal we can use the coord_equal
function
ggplot(life_exp_table, aes(x = life_1952, y = life_2007)) +
geom_point() +
coord_equal(ratio = 1)
Once you’ve created a really nice figure, you often want to save it to a file so that you can use it outside of your R Notebook (e.g. in a paper, slide presentation, …).
We can use the ggsave()
function to save our graphic to file. You can specify the file name and where you would like to save the file (i.e. file path and name) as well as the file type (e.g. JPEG). There are a number of file types available, including: pdf, jpeg, tiff, and png.
Let’s create a figure to save
ggplot(my_gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10() +
labs(title = "Life expectancy increases with income",
subtitle = "Life expectancy vs. GDP per capita",
x = "GDP per capita",
y = "Life Expectancy",
caption = "Data source: gapminder")
Now we’ll use the ggsave()
function to save our last graphic to an image file (.png in this example).
ggsave("LifeExpVsGdp.png", width = 10, height = 8, units = "cm")
You can see that we are able to specify the dimensions (width and height) of the output graphic.
ggsave()
will save the last graphic that you’ve generated unless you tell it otherwise. For instance you can pass it an object that stores a graphic and it will save that specified graphic, e.g.
ggsave(plot = Fig_1, "MyFigure.png")
would save the graphic object Fig_1
to the file “MyFigure.png”
Make the figure as nicely formatted and easily readable as possible. This should be a presentation quality graphic.
Make the figure as nicely formatted and easily readable as possible. This should be a presentation quality graphic.
Note: you can hide the legend on a figure by supplying legend.position = "none"
in your theme()
function call.
Your figure should look like the one below.
my_gap %>%
filter(continent == "Africa") %>%
ggplot(aes(x = year, y = lifeExp, group = country)) +
geom_line() +
geom_point()