Artwork by @allison_horst
ggplot2 is part of the tidyverse.
So, at the top of your script type:
library(tidyverse)
Let’s explore a perennial
challenge for the NHS:
The dataset we loaded earlier, capacity_ae
, shows
changes in the capacity of A&E departments from
2017 to 2018
Closely based on datasets collected by the NHS Benchmarking Network
The dataset we loaded earlier, capacity_ae
, shows
changes in the capacity of A&E departments from
2017 to 2018
Closely based on datasets collected by the NHS Benchmarking Network
The object named capacity_ae is a data frame
A data frame stores tabular data:
Artwork by @allison_horst
In the tidyverse you may see the term "tibble"
We’ll take "tibble" to be synonymous with "data frame"
A tibble... is a modern reimagining of the data.frame...
Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist).
This forces you to confront problems earlier, typically leading to cleaner, more expressive code.
emphasis added to quote
This brings up a view of the data in a new tab:
Click here to show the data frame in a new window
Useful when using multiple monitors
Type the name of the dataset in editor/console, and run the line (shortcut Ctrl + Enter)
(and what they mean)
(and what they mean)
John Tukey, quoted in R for Data Science
We begin our plot with ggplot2
ggplot() +
We begin our plot with ggplot2
ggplot() +
Inside ggplot() we can specify the dataset
ggplot(data = capacity_ae)
We begin our plot with ggplot2
ggplot() +
Inside ggplot() we can specify the dataset
ggplot(data = capacity_ae)
Next, we add layer(s) with + at the end
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait))
There are choices about the chart to use but also the details of the chart
1. What shape will represent the data points?
1. What shape will represent the data points?
geometric object
1. What shape will represent the data? geom
1. What shape will represent the data? geom
2. What visual (aesthetic attributes do we give to the geom?)
1. What shape will represent the data? geom
2. What visual (aesthetic attributes do we give to the geom?)
1. What shape will represent the data? geom
2. What visual (aesthetic attributes do we give to the geom?)
1. What shape will represent the data? geom
2. What visual (aesthetic attributes do we give to the geom?)
Shape/colour/size geom all default
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait))
ggplot(), geom_point(), and aes() are functions
Running a function does something
Functions are given zero or more inputs (arguments)
Arguments of a function are separated by commas
You can explicitly name arguments;
ggplot(data = capacity_ae) +
Or not:
ggplot(capacity_ae) +
Other arguments like axes x and y are in a particular order;
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait))
It is possible to write it like:
ggplot(data = capacity_ae) + geom_point(aes(y = dwait, x = dcubicles))
But could be confusing.
Here, we have provided ggplot() with one named argument
ggplot(data = capacity_ae) +
geom_point(aes(x = dcubicles, y = dwait))
And given aes() two named arguments
Unspecified (yet required) arguments will often revert to default values
Since ggplot2 knows the order of essential arguments, it is not necessary to name arguments:
data = can be omitted
and
x = goes first and y = goes second
ggplot(capacity_ae) + geom_point(aes(dcubicles, dwait))
We tend to describe plots in terms of the geom used:
We can display more than one geom in a plot:
to add a layer
ggplot(data = capacity_ae)
geom_point(aes(x = dcubicles, y = dwait))
geom_smooth(aes(x = dcubicles, y = dwait))
then specify another geom...
This is our current plot:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait))
Add a geom_smooth layer (to help identify patterns)
Hint: Don't forget the + and aes() values in the new layer
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait))
We'd probably prefer a linear fit rather than a non linear fit:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm")
The two sites have seen staffing increases
We can map point colour (aesthetic attribute) to the staff_increase variable to find out
We will add colour to the chart depending on the value of staff_increase (TRUE or FALSE, 1 or 0)
Put an argument inside aes() if you want a visual attribute to change with different values of a variable.
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait, colour = staff_increase)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm")
We could have equally have chosen size or shape but these make graphic less clear
The two sites have indeed seen an increase in staff levels which has had an effect on the dwait even though dcubicles are relatively low.
If you want a visual attribute to be applied across the whole plot, the argument goes outside aes():
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait), colour = "red") + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm")
This works too because the colour is generically applied:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait, colour = "red")) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm")
Or apply a size globally:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait), size = 4) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm")
To avoid duplication, we can pass the common local aes() arguments to ggplot to make them global. Instead of duplicating the same aes(dcubicles, dwait):
ggplot(data = capacity_ae) + geom_point(aes(dcubicles, dwait)) + geom_smooth(aes(dcubicles, dwait))
Move the aes to the "global":
ggplot(data = capacity_ae, aes(dcubicles, dwait)) + geom_point() + geom_smooth()
Another way to visualise the relationship between multiple variables is with a facet_wrap() layer:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + facet_wrap(~ staff_increase)
Another way to visualise the relationship between multiple variables is with a facet_wrap() layer:
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + facet_wrap(~ staff_increase, ncol = 1)
(note: these are simple,
unpolished graphics)
ggplot(data = capacity_ae) + geom_histogram(aes(dwait))
ggplot(data = capacity_ae) + geom_histogram(aes(dwait), binwidth = 10)
With "bins" set so more uniformed in spread:
ggplot(data = capacity_ae) + geom_col(aes(x = site, y = attendance2018))
ggplot(data = capacity_ae) + geom_col(aes(x = site, y = attendance2018))
Reorder site by attendances
ggplot(data = capacity_ae) + geom_col(aes(x = reorder(site, attendance2018), y = attendance2018))
ggplot(data = capacity_ae) + geom_boxplot(aes(staff_increase, dwait))
ggplot(data = capacity_ae) + geom_boxplot(aes(staff_increase, dwait))
Can be applied to all types of charts:
ggplot(data = capacity_ae) + geom_boxplot(aes(staff_increase, dwait)) + labs(title = "Do changes in staffing...", y = "Waiting")
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") + ggsave("plot_name.png")
ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") + ggsave("plot_name.png", units = "cm", height = 10, width = 8)
By default saves a plot in the same dimensions as plot window.
In future, you'll wish to add height, width and "units" arguments to specify plot dimensions.
Creative Commons
Attribution
ShareAlike 4.0
International
To view a copy of this license, visit
https://creativecommons.org/licenses/by/4.0/
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |