My Most Favourite ggplot Plot – Powerful Bar Plot for Presentations
April 23, 2019 By Pascal Schmidt R Tidyverse Tutorial
Today I will be talking about my most favorite ggplot bar plot. I am using this one a lot for my work and in presentations. It gives a great overview of proportions and counts. All in one plot without being overwhelming.
As always, we will be using the Pokemon data set which can be found here.
Loading Data and Forming an Idea of the Plot
poke <- read.csv("Pokemon.csv") %>% dplyr::select(-X.)
Before creating the bar plot, we have to do some data manipulation in order to put everything into the first ggplot()
argument.
Let’s say we want to know how many Grass, Fire, and Water Pokemons there are in each generation. First, let us filter out the Pokemon types we are interested in.
poke <- poke %>% dplyr::filter(stringr::str_detect(Type.1, "(Grass|Fire|Water)"))
After having done that, we can finally start to summarize our data for the bar plot. We want to calculate counts and percentages of the type of Pokemons by each generation. Furthermore, we want to label the bars as well with the type of Pokemon.
In order to do that we have to summarize the data first.
Data Manipulation and Shaping for ggplot
poke %>% dplyr::group_by(Generation, Type.1) %>% dplyr::summarise(n = n()) %>% dplyr::mutate(prop = n / sum(n)) %>% dplyr::ungroup() %>% dplyr::mutate(Generation = as.factor(Generation)) -> data_ggplot
## # A tibble: 18 x 4 ## Generation Type.1 n prop ## <fct> <fct> <int> <dbl> ## 1 1 Fire 14 0.241 ## 2 1 Grass 13 0.224 ## 3 1 Water 31 0.534 ## 4 2 Fire 8 0.229 ## 5 2 Grass 9 0.257 ## 6 2 Water 18 0.514 ## 7 3 Fire 8 0.167 ## 8 3 Grass 13 0.271 ## 9 3 Water 27 0.562 ## 10 4 Fire 5 0.152 ## 11 4 Grass 15 0.455 ## 12 4 Water 13 0.394 ## 13 5 Fire 9 0.214 ## 14 5 Grass 15 0.357 ## 15 5 Water 18 0.429 ## 16 6 Fire 8 0.444 ## 17 6 Grass 5 0.278 ## 18 6 Water 5 0.278
The data frame above shows the counts and percentages of each Pokemon type by generation. Let’s go through the pipe one by one:
- First, we group by generation and type because we want to see the percentages and counts of Grass, Fire, and Water Pokemons by generation
- Then we get the counts
- After that, we get the proportion of each type within each generation with the
mutate()
function. One important fact to note is that our data is still grouped. Hence, you can think of 6 different data frames. One for each generation. Hence, when we don / sum(n)
we get the proportion within the generation rather than within the entire data set. - then we
ungroup()
everything so we can make the generation column into factor variable. Note that we cannot do that as long asGeneration
is still a grouping variable.
Then we go from there and construct the plot.
Constructing my Favorite ggplot
ggplot(data_ggplot, aes(x = Generation, y = prop, fill = Type.1)) + geom_bar(stat = "identity") + geom_text(aes(label = paste0(Type.1, "\n", n, "\n", round(prop, 4) * 100, "%")), position = position_stack(vjust = 0.5)) + theme(legend.position = "none", axis.ticks.y = element_blank(), axis.text.y = element_blank(), axis.title.y = element_blank())
I like this bar plot the most because we can see percentages and counts in one bar plot. Sometimes, when you only do geom_bar(position = “fill”)
, then we do not know the counts which are very important when we want to compare Pokemon types across different generations.
Sometimes, proportions in across different groups look very different because in some groups we only have a few data points. With the bar plot above, there is no ambiguity and everything can be assessed with one look.
I also removed the entire y axis (ticks, labels, and title) because the graph is very self-explanatory. The less people have to look at, to make it clear what the plot is trying to say, the better. Therefore, the legend is gone too.
More Resources:
If you are interested in more resources and material, check out the following blog posts below:
- ggplot tutorial part 1
- ggplot tutorial part 2
- common dplyr verbs
- more advanced dplyr
- magrittr’s pipe and placeholders
I hope you have enjoyed this short post of how to construct my favorite bar plot with ggplot. If you have any questions or suggestions, leave a comment below. Thank you.
Recent Posts
Recent Comments
- Kardiana on The Lasso – R Tutorial (Part 3)
- Pascal Schmidt on RSelenium Tutorial: A Tutorial to Basic Web Scraping With RSelenium
- Pascal Schmidt on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications
- Gisa on Persistent Data Storage With a MySQL Database in R Shiny – An Example App
- Nicholas on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications