Introduction to Data Visualization

23-Jan-2023

Motivation

One of the best tools you have to interpreting/understanding your data are your eyes!

A visual representation of your data can reveal patterns and interesting features that would have been difficult or impossible to identify by looking at a data table.


\(Graphics + Eyes + Brain = Understanding\)

Motivation

What is the relationship between life expectancy and per capita GDP?

You have 30 seconds to assess the relationship. GO!!

country gdpPercap lifeExp
Afghanistan 974.5803 43.828
Albania 5937.0295 76.423
Algeria 6223.3675 72.301
Angola 4797.2313 42.731
Argentina 12779.3796 75.320
Australia 34435.3674 81.235
Austria 36126.4927 79.829
Bahrain 29796.0483 75.635
Bangladesh 1391.2538 64.062
Belgium 33692.6051 79.441
Benin 1441.2849 56.728
Bolivia 3822.1371 65.554
Bosnia and Herzegovina 7446.2988 74.852
Botswana 12569.8518 50.728
Brazil 9065.8008 72.390
Bulgaria 10680.7928 73.005
Burkina Faso 1217.0330 52.295
Burundi 430.0707 49.580
Cambodia 1713.7787 59.723
Cameroon 2042.0952 50.430
Canada 36319.2350 80.653
Central African Republic 706.0165 44.741
Chad 1704.0637 50.651
Chile 13171.6388 78.553
China 4959.1149 72.961
Colombia 7006.5804 72.889
Comoros 986.1479 65.152
Congo, Dem. Rep.  277.5519 46.462
Congo, Rep.  3632.5578 55.322
Costa Rica 9645.0614 78.782
Cote d’Ivoire 1544.7501 48.328
Croatia 14619.2227 75.748
Cuba 8948.1029 78.273
Czech Republic 22833.3085 76.486
Denmark 35278.4187 78.332
Djibouti 2082.4816 54.791
Dominican Republic 6025.3748 72.235
Ecuador 6873.2623 74.994
Egypt 5581.1810 71.338
El Salvador 5728.3535 71.878
Equatorial Guinea 12154.0897 51.579
Eritrea 641.3695 58.040
Ethiopia 690.8056 52.947
Finland 33207.0844 79.313
France 30470.0167 80.657
Gabon 13206.4845 56.735
Gambia 752.7497 59.448
Germany 32170.3744 79.406
Ghana 1327.6089 60.022
Greece 27538.4119 79.483
Guatemala 5186.0500 70.259
Guinea 942.6542 56.007
Guinea-Bissau 579.2317 46.388
Haiti 1201.6372 60.916
Honduras 3548.3308 70.198
Hong Kong, China 39724.9787 82.208
Hungary 18008.9444 73.338
Iceland 36180.7892 81.757
India 2452.2104 64.698
Indonesia 3540.6516 70.650
Iran 11605.7145 70.964
Iraq 4471.0619 59.545
Ireland 40675.9964 78.885
Israel 25523.2771 80.745
Italy 28569.7197 80.546
Jamaica 7320.8803 72.567
Japan 31656.0681 82.603
Jordan 4519.4612 72.535
Kenya 1463.2493 54.110
Korea, Dem. Rep.  1593.0655 67.297
Korea, Rep.  23348.1397 78.623
Kuwait 47306.9898 77.588
Lebanon 10461.0587 71.993
Lesotho 1569.3314 42.592
Liberia 414.5073 45.678
Libya 12057.4993 73.952
Madagascar 1044.7701 59.443
Malawi 759.3499 48.303
Malaysia 12451.6558 74.241
Mali 1042.5816 54.467
Mauritania 1803.1515 64.164
Mauritius 10956.9911 72.801
Mexico 11977.5750 76.195
Mongolia 3095.7723 66.803
Montenegro 9253.8961 74.543
Morocco 3820.1752 71.164
Mozambique 823.6856 42.082
Myanmar 944.0000 62.069
Namibia 4811.0604 52.906
Nepal 1091.3598 63.785
Netherlands 36797.9333 79.762
New Zealand 25185.0091 80.204
Nicaragua 2749.3210 72.899
Niger 619.6769 56.867
Nigeria 2013.9773 46.859
Norway 49357.1902 80.196
Oman 22316.1929 75.640
Pakistan 2605.9476 65.483
Panama 9809.1856 75.537
Paraguay 4172.8385 71.752
Peru 7408.9056 71.421
Philippines 3190.4810 71.688
Poland 15389.9247 75.563
Portugal 20509.6478 78.098
Puerto Rico 19328.7090 78.746
Reunion 7670.1226 76.442
Romania 10808.4756 72.476
Rwanda 863.0885 46.242
Sao Tome and Principe 1598.4351 65.528
Saudi Arabia 21654.8319 72.777
Senegal 1712.4721 63.062
Serbia 9786.5347 74.002
Sierra Leone 862.5408 42.568
Singapore 47143.1796 79.972
Slovak Republic 18678.3144 74.663
Slovenia 25768.2576 77.926
Somalia 926.1411 48.159
South Africa 9269.6578 49.339
Spain 28821.0637 80.941
Sri Lanka 3970.0954 72.396
Sudan 2602.3950 58.556
Swaziland 4513.4806 39.613
Sweden 33859.7484 80.884
Switzerland 37506.4191 81.701
Syria 4184.5481 74.143
Taiwan 28718.2768 78.400
Tanzania 1107.4822 52.517
Thailand 7458.3963 70.616
Togo 882.9699 58.420
Trinidad and Tobago 18008.5092 69.819
Tunisia 7092.9230 73.923
Turkey 8458.2764 71.777
Uganda 1056.3801 51.542
United Kingdom 33203.2613 79.425
United States 42951.6531 78.242
Uruguay 10611.4630 76.384
Venezuela 11415.8057 73.747
Vietnam 2441.5764 74.249
West Bank and Gaza 3025.3498 73.422
Yemen, Rep.  2280.7699 62.698
Zambia 1271.2116 42.384
Zimbabwe 469.7093 43.487

Motivation

What is the relationship between life expectancy and per capita GDP?

You have 30 seconds again…this time it should be much, much easier!

Motivation

Graphics are critical at all stages of a project – from the initial data aquisition and exploration to the final product that is conveys results to other (colleagues, the public, …).

Graphics reveal patterns and features in data that statistics (e.g. mean, median, correlation) may fail to convey/capture.

Consider the four datasets that were constructed by the statistician Francis Anscombe

All of four datasets have the following statistics:

Property Value
Mean of x 9
Sample variance of x 11
Mean of y 7.5
Sample variance of y 4.125
Corr. between x and y 0.816
Linear regression line y = 3.00 + 0.500x
                                          

Based on the above table you would be led to believe that the data look roughly the same.

Motivation

Anscombe’s quartet (source: Wikipedia)

Take home message: You should examine your data graphically!

Data visualization: Goals

You will create graphics for many different purposes throughout this class and your career.

The style, detail, and level of refinement will be a function of your goals.

Data visualization: EDA

Data visualization: Presentation quality figures

Data visualization: Engagement

Good examples of these types of figures are found in:

Graphics in R

R has excellent graphic making capabilities that allow you to create figures of the highest quality.

In fact many figures you see in scientific journals and in the popular press are made in R (many of the graphics in the NY Times are made in R!).

Most of the figures in this class will be made with the ggplot2 package.

Graphics in R

In the this and upcoming lectures you will learn how to make static graphics in R.

You will learn fundamental concepts about how to visualize different types of data and how to generate these visualizations in R.

Later in the term we will cover how to make interactive (dynamic) graphics and basic maps in R.

ggplot2 package

The ggplot2 package implements what is called the grammar of graphics. This is a system that describes a graph’s construction and complex graphs can be built by combining elements together much like you would construct a sentence in a natural language.

ggplot2 package

The grammar of graphics is based on the concept that1:

A graphic is created by mapping the data variables to the aesthetic attributs of geometric objects.

The three essential components of a graphic are:

  1. data: dataset containing the mapped variables
  2. geom: geometric object that the data is mapped to (e.g. point, lines, bars, …)
  3. aes: aesthetic attributes of the geometric object. The aesthetics control how the data variables are mapped to the geometric objects (e.g. x/y position, size, shape, color, …)

Additional components that can be added include:

ggplot2 package

The basic template for creating a graphic in ggplot2 is

ggplot(data = DATASET) + GEOM_FUNCTION(mapping = aes(MAPPINGS) )

Examining graphical components

Minard’s illustration of Napolean’s March (source: Wikipedia)

Examining graphical components

Minard’s illustration of Napolean’s March (source: Wikipedia)

Examining graphical components

Let’s see how we would construct Minard’s figure using the grammar of graphics

Where? data variable aes() geom_
top map longitude x path
" latitude y path
" army size size path
" army direction (forward vs retreat) color path
bottom graph date x line and text
" temperature y line and text

  1. See md chapter 3.↩︎