Data Visualization

Students cheer on the Redhawks during a sporting event at Miami University.

Plots can be constructed using the base package in R as shown on the Creating Graphics page, however the functions within the package are not very versatile. The ggplot2 package addresses this problem by creating more robust graphs that allow for more flexibility and easy maintenance of the graphic.

ggplot()

The ggplot2 package can be loaded by itself or by using the multi-package tidyverse approach. With ggplot2, the user can create highly customizable visualizations. Layers of several visual aspects, including points, lines, and facets can be added together using "+" to create more complex graphics. These visual components are added using the geom_ function, which describes the desired plot (i.e. geom_point() , geom_line() , geom_histogram()). The visual appearance of the plot is controlled with the aes() function.

In this example, we will walk through some of the functionality of ggplot2. First, we will begin by creating a simple scatterplot using the gapminder data available in the gapminder package in R.

Note: The data was filtered using dpylr to reduce the number of data points for sake of simplicity of the example (for more on the filter() function from dpylr, visit the Data Manipulation page).

library(ggplot2)
library(dplyr)
library(gapminder)

gapminder_1997<- gapminder %>% 
  filter(year == "1997", continent == "Europe")
ggplot(gapminder_1997, aes(x = country, y = gdpPercap)) +
  geom_point() +
  theme_minimal()

this a scatterplot of the countries on the x axis and the GDP per cap on the y

As shown above the plot does not offer much insight nor is it visually appealing.

An advantage to using this package is how easy it is to show multiple layers (multiple variables) on top of one another. Let's suppose we were interested in population and specific countries. The code below shows how simple it is to add these variables to the graph.

ggplot(gapminder_1997, aes(x = country, y = gdpPercap, color=country, size=pop)) +
  geom_point() +
  guides(color = "none", size = "none") +
  theme_minimal()

this is the same plot as before but the size correlates to population and the countries are all colored differently

The plot shown above is arguably not much better. The colors are too similar so no insight can really be pulled from adding this dimension. Additionally, there are too many labels in the legend to show the key so it is impossible to see the name of the countries. Later in this example we'll show one way to fix the messy x-axis, but first let's try highlighting these specific countries (Spain, Germany, France, and United Kingdom) with the code below:

gapminder_subset<-gapminder_1997 %>% 
  filter(country %in% c("Germany", "France", "Spain", "United Kingdom"))
  ggplot() + 
  geom_point(aes(x = country, y = gdpPercap, size=pop), data=gapminder_1997) +
  geom_point(aes(x = country, y = gdpPercap, size=pop), color="blue", data=gapminder_subset) +
  theme_minimal() +
  guides(size = "none")
  

this is the same plot as before but Germany, France, Spain, and the UK have blue dots and the rest are black

This graph leads the viewer to some insight, but still is not very appealing. ggplot() also allows for easy manipulation of visual aspects such as turning down the opacity of points. As shown below, by adjusting the opacity of the black points to gray, now the lighter color appears to better complement the dark blue points. Further, notice how the axis labels have also been modified to produce a more appealing plot.

ggplot() + 
  geom_point(aes(x = country, y = gdpPercap, size=pop), alpha=0.3, data=gapminder_1997) +
  geom_point(aes(x = country, y = gdpPercap, size=pop), color="blue", data=gapminder_subset) +
  theme_minimal() +
  guides(size = "none") +
  theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 60, hjust = 1)) + 
  labs(y= "GDP per Cap")

this is the same plot as before but the black dots are now gray and the x axis labels are tilted 60 degrees

Keep in mind that creating visualizations is an iterative process. In order to construct a complete graphic, you should align it to the story you are trying to tell. The end result of the above example likely isn't the final plot, but the illustrated steps should get you started with using ggplot2. For more functions within ggplot2, visit the ggplot2 Cheat Sheet provided by RStudio.

Need a Refresher?

Go back to the beginner tutorials.