10 Colour scales and legends
After position, the most commonly used aesthetics are those based on colour, and there are many ways to map values to colours in ggplot2. Before we look at the details, it’s useful to learn a little bit of colour theory. Colour theory is complex because the underlying biology of the eye and brain is complex, and this introduction will only touch on some of the more important issues. An excellent and more detailed exposition is available online at http://tinyurl.com/clrdtls.
10.1 A little colour theory
At the physical level, colour is produced by a mixture of wavelengths of light. To characterise a colour completely, we need to know the complete mixture of wavelengths. Fortunately for us the human eye only has three different colour receptors, and so we can summarise the perception of any colour with just three numbers. You may be familiar with the RGB encoding of colour space, which defines a colour by the intensities of red, green and blue light needed to produce it. One problem with this space is that it is not perceptually uniform: the two colours that are one unit apart may look similar or very different depending on where they are in the colour space. This makes it difficult to create a mapping from a continuous variable to a set of colours. There have been many attempts to come up with colours spaces that are more perceptually uniform. We’ll use a modern attempt called the HCL colour space, which has three components of hue, chroma and luminance:
- Hue ranges from 0 to 360 (an angle) and gives the “colour” of the colour (blue, red, orange, etc).
- Chroma is the “purity” of a colour, ranging from 0 (grey) to a maximum that varies with luminance.
- Luminance is the lightness of the colour, ranging from 0 (black) to 1 (white).
The three dimensions have different properties. Hues are arranged around a colour wheel and are not perceived as ordered: e.g. green does not seem “larger” than red, and blue does not seem to be “in between” green or red. In contrast, both chroma and luminance are perceived as ordered: pink is perceived as lying between red and white, and grey is seen to fall between black and white.
The combination of these three components does not produce a simple geometric shape. Figure 10.1 attempts to show the 3d shape of the space. Each slice is a constant luminance (brightness) with hue mapped to angle and chroma to radius. You can see the centre of each slice is grey and the colours get more intense as they get closer to the edge.
An additional complication is that many people (~10% of men) do not possess the normal complement of colour receptors and so can distinguish fewer colours than usual. In brief, it’s best to avoid red-green contrasts, and to check your plots with systems that simulate colour blindness. Visicheck is one online solution. Another alternative is the dichromat package (Lumley 2007) which provides tools for simulating colour blindness, and a set of colour schemes known to work well for colour-blind people. You can also help people with colour blindness in the same way that you can help people with black-and-white printers: by providing redundant mappings to other aesthetics like size, line type or shape.
10.2 Continuous colour scales
Colour gradients are often used to show the height of a 2d surface. The plots in this section use the surface of a 2d density estimate of the
faithful dataset (Azzalini and Bowman 1990), which records the waiting time between eruptions and during each eruption for the Old Faithful geyser in Yellowstone Park. I hide the legends and set
expand to 0, to focus on the appearance of the data. Remember: although I use the
erupt plot to illustrate concepts using with a fill aesthetic, the same ideas apply to colour scales. Any time I refer to
scale_fill_*() in this section there is a corresponding
scale_colour_*() for the colour aesthetic (or
scale_color_*() if you prefer US spelling).
ggplot(faithfuld, aes(waiting, eruptions, fill = density)) + erupt <- geom_raster() + scale_x_continuous(NULL, expand = c(0, 0)) + scale_y_continuous(NULL, expand = c(0, 0)) + theme(legend.position = "none")
10.2.1 Particular palettes
There are multiple ways to specify continuous colour scales. Later I’ll talk about general purpose tools that you can use to construct your own palette, but this is often unnecessary as there are many “hand picked” palettes available. For example, ggplot2 supplies two scale functions that bundle pre-specified palettes,
scale_fill_distiller(). The viridis scales (Garnier 2018) are designed to be perceptually uniform in both colour and when reduced to black and white, and to be perceptible to people with various forms of colour blindness.
erupt+ scale_fill_viridis_c() erupt + scale_fill_viridis_c(option = "magma")erupt
The second group of continuous colour scales built in to ggplot2 are derived from the ColorBrewer scales:
scale_fill_brewer() provides these colours as discrete palettes, while
scale_fill_fermenter() are the continuous and binned analogs. I discuss these scales in Section 10.3), but for illustrative purposes include some examples here:
+ scale_fill_distiller() erupt + scale_fill_distiller(palette = "RdPu") erupt + scale_fill_distiller(palette = "YlOrBr")erupt
There are many other packages that provide useful colour palettes. For example, scico (Pedersen and Crameri 2020) provides more palettes that are perceptually uniform and suitable for scientific visualisation:
+ scico::scale_fill_scico(palette = "bilbao") # the default erupt + scico::scale_fill_scico(palette = "vik") erupt + scico::scale_fill_scico(palette = "lajolla")erupt
However, as there are a great many palette packages in R, a particularly useful package is paletteer (Hvitfeldt 2020), which aims to provide a common interface:
+ paletteer::scale_fill_paletteer_c("viridis::plasma") erupt + paletteer::scale_fill_paletteer_c("scico::tokyo") erupt + paletteer::scale_fill_paletteer_c("gameofthrones::targaryen")erupt
10.2.2 Robust recipes
The default scale for continuous fill scales is
scale_fill_continuous() which in turn defaults to
scale_fill_gradient(). As a consequence, these three commands produce the same plot using a gradient scale:
erupt+ scale_fill_continuous() erupt + scale_fill_gradient()erupt
Gradient scales provide a robust method for creating any colour scheme you like. All you need to do is specify two or more reference colours, and ggplot2 will interpolate linearly between them. There are three functions that you can use for this purpose:
scale_fill_gradient()produces a two-colour gradient
scale_fill_gradient2()produces a three-colour gradient with specified midpoint
scale_fill_gradientn()produces an n-colour gradient
The use of gradient scales is illustrated below. The first plot uses a scale that linearly interpolates from grey (hex code:
"#bebebe") at the
low end of the scale limits to brown (
"#a52a2a") at the
high end. The second plot has the same endpoints but uses
scale_fill_gradient2() to interpolate first from grey to white (
#ffffff) and then from white to brown. Note that the
mid argument specifies the colour to be shown at the intermediate point, and
midpoint is the value in the data at which this colour is used (the default is
midpoint = 0). The third method is to use
scale_fill_gradientn() which takes a vector of reference
colours as its argument, and constructs a scale that linearly interpolates between the specified values. By default, the
colours are presumed to be equally spaced along the scale, but if you prefer you can specify a vector of
values that correspond to each of the reference colours.
+ scale_fill_gradient(low = "grey", high = "brown") erupt + scale_fill_gradient2(low = "grey", mid = "white", high = "brown", midpoint = .02) erupt + scale_fill_gradientn(colours = terrain.colors(7))erupt
Creating good colour palettes requires some care. Generally, for a two-point gradient scale you want to convey the perceptual impression that the values are sequentially ordered, so you want to keep hue constant, and vary chroma and luminance. The Munsell colour system is useful for this as it provides an easy way of specifying colours based on their hue, chroma and luminance. The munsell package (Wickham 2018) provides easy access to the Munsell colours, which can then be used to specify a gradient scale:
::hue_slice("5P") + # generate a ggplot with hue_slice() munsell annotate( # add arrows for annotation geom = "segment", x = c(7, 7), y = c(1, 10), xend = c(7, 7), yend = c(2, 9), arrow = arrow(length = unit(2, "mm")) ) #> Warning: Removed 31 rows containing missing values (geom_text). # construct scale + scale_fill_gradient( erupt low = munsell::mnsl("5P 2/12"), high = munsell::mnsl("5P 7/12") )
The labels on the left plot are a little difficult to read at this scale, so I have used
annotate() to add arrows highlighting the column used to construct the scale on the right. For more information on the munsell package see https://github.com/cwickham/munsell/.
Three-point gradient scales have slightly different design criteria. Typically the goal in such a scale is to convey the perceptual impression that there is a natural midpoint (often a zero value) from which the other values diverge. The left plot below shows how to create a divergent “yellow/blue” scale, though it is a little artificial in this example.
Finally, if you have colours that are meaningful for your data (e.g., black body colours or standard terrain colours), or you’d like to use a palette produced by another package, you may wish to use an n-point gradient. As an illustration, the middle and right plots below use the colorspace package (Zeileis, Hornik, and Murrell 2008). For more information on the colorspace package see https://colorspace.r-forge.r-project.org/.
# munsell example + scale_fill_gradient2( erupt low = munsell::mnsl("5B 7/8"), high = munsell::mnsl("5Y 7/8"), mid = munsell::mnsl("N 7/0"), midpoint = .02 ) # colorspace examples + scale_fill_gradientn(colours = colorspace::heat_hcl(7)) erupt + scale_fill_gradientn(colours = colorspace::diverge_hcl(7))erupt
10.2.3 Missing values
All continuous colour scales have an
na.value parameter that controls what colour is used for missing values (including values outside the range of the scale limits). By default it is set to grey, which will stand out when you use a colourful scale. If you use a black and white scale, you might want to set it to something else to make it more obvious. You can set
na.value = NA to make missing values invisible, or choose a specific colour if you prefer:
data.frame(x = 1, y = 1:5, z = c(1, 3, 2, NA, 5)) df <- ggplot(df, aes(x, y)) + base <- geom_tile(aes(fill = z), size = 5) + labs(x = NULL, y = NULL) base+ scale_fill_gradient(na.value = NA) base + scale_fill_gradient(na.value = "yellow")base
10.2.4 Limits, breaks, and labels
You can suppress the breaks entirely by setting them to
NULL. For axes, this removes the tick marks, grid lines, and labels; and for legends this this removes the keys and labels.
data.frame( toy <-const = 1, up = 1:4, txt = letters[1:4], big = (1:4)*1000, log = c(2, 5, 10, 2000) ) ggplot(toy, aes(up, up, fill = big)) + leg <- geom_tile() + labs(x = NULL, y = NULL) + scale_fill_continuous(breaks = NULL)leg
10.3 Discrete colour scales
Discrete colour and fill scales occur in many situations. A typical example is a barchart that encodes both position and fill to the same variable. Many concepts from Section 10.2 apply to discrete scales, which I will illustrate using this barchart as the running example:
data.frame(x = c("a", "b", "c", "d"), y = c(3, 4, 1, 2)) df <- ggplot(df, aes(x, y, fill = x)) + bars <- geom_bar(stat = "identity") + labs(x = NULL, y = NULL) + theme(legend.position = "none")
The default scale for discrete colours is
scale_fill_discrete() which in turn defaults to
scale_fill_hue() so these are identical plots:
bars+ scale_fill_discrete() bars + scale_fill_hue()bars
This default scale has some limitations (discussed shortly) so I’ll begin by discussing tools for producing nicer discrete palettes.
10.3.1 Brewer scales
scale_colour_brewer() is a discrete colour scale that—along with the continuous analog
scale_colour_distiller() and binned analog
scale_colour_fermenter()—uses handpicked “ColorBrewer” colours taken from http://colorbrewer2.org/. These colours have been designed to work well in a wide variety of situations, although the focus is on maps and so the colours tend to work better when displayed in large areas. There are many different options:
The first group of palettes are sequential scales that are useful when your discrete scale is ordered (e.g., rank data), and are available for continuous data using
scale_colour_distiller(). For unordered categorical data, the palettes of most interest are those in the second group. ‘Set1’ and ‘Dark2’ are particularly good for points, and ‘Set2’, ‘Pastel1’, ‘Pastel2’ and ‘Accent’ work well for areas.
+ scale_fill_brewer(palette = "Set1") bars + scale_fill_brewer(palette = "Set2") bars + scale_fill_brewer(palette = "Accent")bars
Note that no palette is uniformly good for all purposes. Scatter plots typically use small plot markers, and bright colours tend to work better than subtle ones:
# scatter plot data.frame(x = 1:3 + runif(30), y = runif(30), z = c("a", "b", "c")) df <- ggplot(df, aes(x, y)) + point <- geom_point(aes(colour = z)) + theme(legend.position = "none") + labs(x = NULL, y = NULL) # three palettes + scale_colour_brewer(palette = "Set1") point + scale_colour_brewer(palette = "Set2") point + scale_colour_brewer(palette = "Pastel1")point
Bar plots usually contain large patches of colour, and bright colours can be overwhelming. Subtle colours tend to work better in this situation:
# bar plot data.frame(x = 1:3, y = 3:1, z = c("a", "b", "c")) df <- ggplot(df, aes(x, y)) + area <- geom_bar(aes(fill = z), stat = "identity") + theme(legend.position = "none") + labs(x = NULL, y = NULL) # three palettes + scale_fill_brewer(palette = "Set1") area + scale_fill_brewer(palette = "Set2") area + scale_fill_brewer(palette = "Pastel1")area
10.3.2 Hue and grey scales
The default colour scheme picks evenly spaced hues around the HCL colour wheel. This works well for up to about eight colours, but after that it becomes hard to tell the different colours apart. You can control the default chroma and luminance, and the range of hues, with the
bars+ scale_fill_hue(c = 40) bars + scale_fill_hue(h = c(180, 300))bars
One disadvantage of the default colour scheme is that because the colours all have the same luminance and chroma, when you print them in black and white, they all appear as an identical shade of grey. Noting this, if you are intending a discrete colour scale to be printed in black and white, it is better to use
scale_fill_grey() which maps discrete data to grays, from light to dark:
+ scale_fill_grey() bars + scale_fill_grey(start = 0.5, end = 1) bars + scale_fill_grey(start = 0, end = 0.5)bars
10.3.3 Manual scales
If none of the hand-picked palettes is suitable, or if you have your own preferred colours, you can use
scale_fill_manual() to set the colours manually. This can be useful if you wish to choose colours that highlight a secondary grouping structure or draw attention to different comparisons:
+ scale_fill_manual(values = c("sienna1", "sienna4", "hotpink1", "hotpink4")) bars + scale_fill_manual(values = c("tomato1", "tomato2", "tomato3", "tomato4")) bars + scale_fill_manual(values = c("grey", "black", "grey", "grey"))bars
You can also use a named vector to specify colors to be assigned to each level which allows you to specify the levels in any order you like:
+ scale_fill_manual(values = c( bars "d" = "grey", "c" = "grey", "b" = "black", "a" = "grey" ))
Recreate the following plot:
10.4 Binned colour scales
Colour scales also come in binned versions. The default scale is
scale_fill_binned() which in turn defaults to
scale_fill_steps(). As with the binned position scales discussed in Section 9.4 these scales have an
n.breaks argument that controls the number of discrete colour categories created by the scale. Counterintuitively—because the human visual system is very good at detecting edges—this can sometimes make a continuous colour gradient easier to perceive:
+ scale_fill_steps(n.breaks = 8)erupt
In other respects
scale_fill_steps() is analogous to
scale_fill_gradient(), and allows you to construct your own two-colour gradients. There is also a three-colour variant
scale_fill_steps2() and n-colour scale variant
scale_fill_stepsn() that behave similarly to their continuous counterparts:
+ scale_fill_steps(low = "grey", high = "brown")erupt
+ scale_fill_steps2(low = "grey", mid = "white", high = "brown", midpoint = .02)erupt
+ scale_fill_stepsn(n.breaks = 12, colours = terrain.colors(12))erupt
A brewer analog for binned scales also exists, and is called
+ scale_fill_fermenter(n.breaks = 9)erupt
+ scale_fill_fermenter(n.breaks = 9, palette = "Oranges")erupt
+ scale_fill_fermenter(n.breaks = 9, palette = "PuOr")erupt
Note that like the discrete
scale_fill_brewer()—and unlike the continuous
scale_fill_distiller()—the binned function
scale_fill_fermenter() does not interpolate between the brewer colours, and if you set
n.breaks larger than the number of colours in the palette a warning message will appear and some colours will not be displayed.
10.5 Alpha scales
Alpha scales map the transparency of a shade to a value in the data. They are not often useful, but can be a convenient way to visually down-weight less important observations.
scale_alpha() is an alias for
scale_alpha_continuous() since that is the most common use of alpha, and it saves a bit of typing.
A number of settings that affect the overall display of the legends are controlled through the theme system. You’ll learn more about that in Section 17.2, but for now, all you need to know is that you modify theme settings with the
The position and justification of legends are controlled by the theme setting
legend.position, which takes values “right”, “left”, “top”, “bottom”, or “none” (no legend).
ggplot(toy, aes(up, up)) + base <- geom_point(aes(colour = txt), size = 3) + xlab(NULL) + ylab(NULL) + theme(legend.position = "left") base + theme(legend.position = "right") # the default base + theme(legend.position = "bottom") base + theme(legend.position = "none")base
Switching between left/right and top/bottom modifies how the keys in each legend are laid out (horizontal or vertically), and how multiple legends are stacked (horizontal or vertically). If needed, you can adjust those options independently:
legend.direction: layout of items in legends (“horizontal” or “vertical”).
legend.box: arrangement of multiple legends (“horizontal” or “vertical”).
legend.box.just: justification of each legend within the overall bounding box, when there are multiple legends (“top”, “bottom”, “left”, or “right”).
Alternatively, if there’s a lot of blank space in your plot you might want to place the legend inside the plot. You can do this by setting
legend.position to a numeric vector of length two. The numbers represent a relative location in the panel area:
c(0, 1) is the top-left corner and
c(1, 0) is the bottom-right corner. You control which corner of the legend the
legend.position refers to with
legend.justification, which is specified in a similar way. Unfortunately positioning the legend exactly where you want it requires a lot of trial and error.
ggplot(toy, aes(up, up)) + base <- geom_point(aes(colour = txt), size = 3) + theme(legend.position = c(0, 1), legend.justification = c(0, 1)) base + theme(legend.position = c(0.5, 0.5), legend.justification = c(0.5, 0.5)) base + theme(legend.position = c(1, 0), legend.justification = c(1, 0))base
There’s also a margin around the legends, which you can suppress with
legend.margin = unit(0, "mm").
10.7 Legend key glyphs
In most cases the default glyphs shown in the legend key will be appropriate to the layer and the aesthetic. Line plots of different colours will show up as lines of different colours in the legend, boxplots will appear as small boxplots in the legend, and so on. Should you need to override this behaviour, the
key_glyph argument can be used to associate a particular layer with a different kind of glyph. For example:
ggplot(economics, aes(date, psavert, color = "savings")) base <- + geom_line() base + geom_line(key_glyph = "timeseries")base
More precisely, each geom is associated with a function such as
draw_key_path() which is responsible for drawing the key when the legend is created. You can pass the desired key drawing function directly: for example,
base + geom_line(key_glyph = draw_key_timeseries) would also produce the plot shown above right.
The legend guide displays individual keys in a table. The most useful options are:
ncolwhich specify the dimensions of the table.
byrowcontrols how the table is filled:
FALSEfills it by column (the default),
TRUEfills it by row.
ggplot(mpg, aes(drv, fill = factor(cyl))) + geom_bar() base <- base+ guides(fill = guide_legend(ncol = 2)) base + guides(fill = guide_legend(ncol = 2, byrow = TRUE))base
reversereverses the order of the keys:
base+ guides(fill = guide_legend(reverse = TRUE))base
override.aesis useful when you want the elements in the legend display differently to the geoms in the plot. This is often required when you’ve used transparency or size to deal with moderate overplotting and also used colour in the plot.
ggplot(mpg, aes(displ, hwy, colour = drv)) + base <- geom_point(size = 4, alpha = .2, stroke = 0) + guides(colour = guide_legend()) base + guides(colour = guide_legend(override.aes = list(alpha = 1)))base
default.unit) allow you to specify the size of the keys. These are grid units, e.g.
guide_bins() is suited to the situation when a continuous variable is binned and then mapped to an aesthetic that produces a legend, such as size, colour and fill. For instance, in the
mpg data we could use
scale_size_binned() to create a binned version of the continuous variable
ggplot(mpg, aes(displ, manufacturer, size = hwy)) + base <- geom_point(alpha = .2) + scale_size_binned()
guide_legend(), the guide created for a binned scale by
guide_bins() does not organise the individual keys into a table. Instead they are arranged in a column (or row) along a single vertical (or horizontal) axis, which by default is displayed with its own axis. The important arguments to
guide_bins() are listed below:
axisindicates whether the axis should be drawn (default is
base+ guides(size = guide_bins(axis = FALSE))base
directionis a character string specifying the direction of the guide:
+ guides(size = guide_bins(direction = "vertical")) base + guides(size = guide_bins(direction = "horizontal"))base
show.limitsspecifies whether tick marks are shown at the ends of the guide axis
axis.arroware used to control the guide axis that is displayed alongside the legend keys
+ guides(size = guide_bins(show.limits = TRUE)) base + guides( base size = guide_bins( axis.colour = "red", axis.arrow = arrow( length = unit(.1, "inches"), ends = "first", type = "closed" ) ))
override.aeshave the same behaviour as
The colour bar guide is designed for continuous ranges of colors—as its name implies, it outputs a rectangle over which the color gradient varies. The most important arguments are:
barheightallow you to specify the size of the bar. These are grid units, e.g.
nbincontrols the number of slices. You may want to increase this from the default value of 20 if you draw a very long bar.
reverseflips the colour bar to put the lowest values at the top.
These options are illustrated below:
ggplot(mpg, aes(cyl, displ, colour = hwy)) + base <- geom_point(size = 2) base+ guides(colour = guide_colourbar(reverse = TRUE)) base + guides(colour = guide_colourbar(barheight = unit(2, "cm")))base
This “colour steps” guide is a version of
guide_colourbar() appropriate for binned colour and fill scales. It shows the area between breaks as a single constant colour, rather than displaying a colour gradient that varies smoothly along the bar. Arguments mostly mirror those for
guide_colourbar(). The additional arguments are as follows:
show.limitsindicates whether values should be shown at the ends of the stepped colour bar (analogous to the corresponding argument in
ggplot(mpg, aes(displ, hwy, colour = cyl)) + base <- geom_point() + scale_color_binned() + guides(colour = guide_coloursteps(show.limits = TRUE)) base + guides(colour = guide_coloursteps(show.limits = FALSE))base
ticksis a logical variable indicating whether tick marks should be displayed adjacent to the legend labels (default is
NULL, in which case the value is inherited from the scale)
even.stepsis a logical variable indicating whether bins should be evenly spaced (default is
TRUE) or proportional in size to their frequency in the data
How do you make legends appear to the left of the plot?
What’s gone wrong with this plot? How could you fix it?
ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = drv, shape = drv)) + scale_colour_discrete("Drive train")
Can you recreate the code for this plot?
#> `geom_smooth()` using formula 'y ~ x'
Azzalini, A., and A. W. Bowman. 1990. “A Look at Some Data on the Old Faithful Geyser.” Applied Statistics 39: 357–65.
Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.
Hvitfeldt, Emil. 2020. Paletteer: Comprehensive Collection of Color Palettes. https://CRAN.R-project.org/package=paletteer.
Lumley, Thomas. 2007. Dichromat: Color Schemes for Dichromats.
Pedersen, Thomas Lin, and Fabio Crameri. 2020. Scico: Colour Palettes Based on the Scientific Colour-Maps. https://CRAN.R-project.org/package=scico.
Wickham, Charlotte. 2018. Munsell: Utilities for Using Munsell Colours. https://CRAN.R-project.org/package=munsell.
Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2008. “Escaping RGBland: Selecting Colors for Statistical Graphics.” Computational Statistics & Data Analysis. http://statmath.wu-wien.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2008.pdf.