My Most Favourite ggplot Plot – Powerful Bar Plot for Presentations

April 23, 2019 By Pascal Schmidt R Tidyverse Tutorial

Today I will be talking about my most favorite ggplot bar plot. I am using this one a lot for my work and in presentations. It gives a great overview of proportions and counts. All in one plot without being overwhelming.

As always, we will be using the Pokemon data set which can be found here.

favourite ggplot bar plot

Loading Data and Forming an Idea of the Plot

poke <- read.csv("Pokemon.csv") %>%
  dplyr::select(-X.)

Before creating the bar plot, we have to do some data manipulation in order to put everything into the first ggplot() argument.

Let’s say we want to know how many Grass, Fire, and Water Pokemons there are in each generation. First, let us filter out the Pokemon types we are interested in.

poke <- poke %>%
  dplyr::filter(stringr::str_detect(Type.1, "(Grass|Fire|Water)"))

After having done that, we can finally start to summarize our data for the bar plot. We want to calculate counts and percentages of the type of Pokemons by each generation. Furthermore, we want to label the bars as well with the type of Pokemon.

In order to do that we have to summarize the data first.

Data Manipulation and Shaping for ggplot

poke %>%
  dplyr::group_by(Generation, Type.1) %>%
  dplyr::summarise(n = n()) %>%
  dplyr::mutate(prop = n / sum(n)) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(Generation = as.factor(Generation)) -> data_ggplot
## # A tibble: 18 x 4
##    Generation Type.1     n  prop
##    <fct>      <fct>  <int> <dbl>
##  1 1          Fire      14 0.241
##  2 1          Grass     13 0.224
##  3 1          Water     31 0.534
##  4 2          Fire       8 0.229
##  5 2          Grass      9 0.257
##  6 2          Water     18 0.514
##  7 3          Fire       8 0.167
##  8 3          Grass     13 0.271
##  9 3          Water     27 0.562
## 10 4          Fire       5 0.152
## 11 4          Grass     15 0.455
## 12 4          Water     13 0.394
## 13 5          Fire       9 0.214
## 14 5          Grass     15 0.357
## 15 5          Water     18 0.429
## 16 6          Fire       8 0.444
## 17 6          Grass      5 0.278
## 18 6          Water      5 0.278

The data frame above shows the counts and percentages of each Pokemon type by generation. Let’s go through the pipe one by one:

  • First, we group by generation and type because we want to see the percentages and counts of Grass, Fire, and Water Pokemons by generation
  • Then we get the counts
  • After that, we get the proportion of each type within each generation with the mutate() function. One important fact to note is that our data is still grouped. Hence, you can think of 6 different data frames. One for each generation. Hence, when we do n / sum(n) we get the proportion within the generation rather than within the entire data set.
  • then we ungroup() everything so we can make the generation column into factor variable. Note that we cannot do that as long as Generation is still a grouping variable.

Then we go from there and construct the plot.

Constructing my Favorite ggplot

ggplot(data_ggplot, aes(x = Generation, y = prop, fill = Type.1)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0(Type.1, "\n", n, "\n", round(prop, 4) * 100, "%")), 
            position = position_stack(vjust = 0.5)) +
  theme(legend.position = "none",
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        axis.title.y = element_blank())

favourite ggplot bar plot

I like this bar plot the most because we can see percentages and counts in one bar plot. Sometimes, when you only do geom_bar(position = “fill”), then we do not know the counts which are very important when we want to compare Pokemon types across different generations.

Sometimes, proportions in across different groups look very different because in some groups we only have a few data points. With the bar plot above, there is no ambiguity and everything can be assessed with one look.

I also removed the entire y axis (ticks, labels, and title) because the graph is very self-explanatory. The less people have to look at, to make it clear what the plot is trying to say, the better. Therefore, the legend is gone too.

More Resources:

If you are interested in more resources and material, check out the following blog posts below:

I hope you have enjoyed this short post of how to construct my favorite bar plot with ggplot. If you have any questions or suggestions, leave a comment below. Thank you.

Post your comment