Welcome to the second edition of “ggplot2: elegant graphics for data analysis”. I’m so excited to have an updated book that shows off all the latest and greatest ggplot2 features, as well as the great things that have been happening in R and in the ggplot2 community the last five years. The ggplot2 community is vibrant: the ggplot2 mailing list has over 7,000 members and there is a very active Stack Overflow community, with nearly 10,000 questions tagged with ggplot2. While most of my development effort is no longer going into ggplot2 (more on that below), there’s never been a better time to learn it and use it.
I am tremendously grateful for the success of ggplot2. It’s one of the most commonly downloaded R packages (over a million downloads in the last year!) and has influenced the design of graphics packages for other languages. Personally, ggplot2 has bought me many exciting opportunities to travel the world and meet interesting people. I love hearing how people are using R and ggplot2 to understand the data that they care about.
A big thanks for this edition goes to Carson Sievert, who helped me modernise the code, including converting the sources to R Markdown. He also updated many of the examples and helped me proofread the book.
I’ve spent a lot of effort ensuring that this edition is a true upgrade over the first. As well as updating the code everywhere to make sure it’s fully compatible with the latest version of ggplot2, I have:
Shown much more code in the book, so it’s easier to use as a reference. Overall the book has a more “knitr”-ish sensibility: there are fewer floating figures and tables, and more inline code. This makes the layout a little less pretty but keeps related items closer together.
Published the complete source online at https://github.com/hadley/ggplot2-book.
ggplot()in the introduction. Feedback indicated that
qplot()was a crutch: it makes simple plots a little easier, but it doesn’t help with mastering the grammar.
Added practice exercises throughout the book so you can practice new techniques immediately after learning about them.
Added pointers to the rich ecosystem of packages that have built up around ggplot2. You’ll now see a number of other packages highlighted in the book, and get pointers to other packages I think are particularly useful.
Overhauled the toolbox chapter to cover all the new geoms. I’ve added a completely new section on text labels, since it’s important and not covered in detail elsewhere. The mapping section has been considerably expanded to talk more about the different types of map data, and where you might find them.
Completely rewritten the scales chapter to focus on the most important tasks. It also discusses the new features that give finer control over legend appearance, and shows off some of the new scales added to ggplot2.
Split the data analysis chapter into three pieces: data tidying (with tidyr), data manipulation (with dplyr), and model visualisation (with broom). I discuss the latest iteration of my data manipulation tools, and introduce the fantastic broom package by David Robinson.
The book is accompanied by a new version of ggplot2: version 2.0.0. This includes a number of minor tweaks and improvements, and considerable improvements to the documentation. Coming back to ggplot2 development after a considerable pause has helped me to see many problems that previously escaped notice. ggplot2 2.0.0 (finally!) contains an official extension mechanism so that others can contribute new ggplot2 components in their own packages. This is documented in a new vignette,
ggplot2 is now stable, and is unlikely to change much in the future. There will be bug fixes and there may be new geoms, but there will be no large changes to how ggplot2 works. The next iteration of ggplot2 is ggvis. ggvis is significantly more ambitious because it aims to provide a grammar of interactive graphics. ggvis is still young, and lacks many of the features of ggplot2 (most notably it currently lacks faceting and has no way to make static graphics), but over the coming years the goal is to make ggvis better than ggplot2.
The syntax of ggvis is a little different to ggplot2. You won’t be able to trivially convert your ggplot2 plots to ggvis, but we think the cost is worth it: the new syntax is considerably more consistent, and will be easier for newcomers to learn. If you’ve mastered ggplot2, you’ll find your skills transfer very well to ggvis and after struggling with the syntax for a while, it will start to feel quite natural. The important skills you learn when mastering ggplot2 are not the programmatic details of describing a plot in code, but the much harder challenge of thinking about how to turn data into effective visualisations.
Many people have contributed to this book with high-level structural insights, spelling and grammar corrections and bug reports. I’d particularly like to thank William E. J. Doane, Alexander Forrence, Devin Pastoor, David Robinson, and Guangchuang Yu, for their detailed technical reviews of the book.
Many others have contributed over the (now quite long!) lifetime of ggplot2. I would like to thank: Leland Wilkinson, for discussions and comments that cemented my understanding of the grammar; Gabor Grothendieck, for early helpful comments; Heike Hofmann and Di Cook, for being great advisors and supporting the development of ggplot2 during my PhD; Charlotte Wickham; the students of stat480 and stat503 at ISU, for trying it out when it was very young; Debby Swayne, for masses of helpful feedback and advice; Bob Muenchen, Reinhold Kliegl, Philipp Pagel, Richard Stahlhut, Baptiste Auguie, Jean-Olivier Irisson, Thierry Onkelinx and the many others who have read draft versions of the book and given me feedback; and last, but not least, the members of R-help and the ggplot2 mailing list, for providing the many interesting and challenging graphics problems that have helped motivate this book.