Creating Beautiful and Flexible Summary Statistics Tables in R With gtsummary
January 24, 2021 By Pascal Schmidt R Statistics
gtsummary
is a great package for doing summary statistics tables in R. The package has a lot of functionality and I like the flexibility of the package. Doing summary statistics tables with this package is very easy and I like this package almost as much as the arsenal
package. Almost as much because it is not as mature yet but will certainly become as good or better as the arsenal package for summary statistics tables in R.
The great thing about gtsummary
is that you can create summary statistics tables and also other tables such as regression tables. It does not need a lot of lines of code to create a nice looking table.
First, we will be loading some libraries. We will be displaying tables with the gapminder
data set. With a lot of summary statistics tables, it is difficult to display missing values in a proper way and oftentimes, there is only one default method that cannot be changed. With the gtsummary
package, one has lots of options with how to customize their summary statistics table.
Let’s get started
Creating A Basic Summary Statistics Table in R
library(tidyverse) library(gtsummary) library(gapminder) gap <- gapminder %>% dplyr::mutate_all(~ ifelse( sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)), as.character(.), NA )) %>% dplyr::mutate_at(vars(year:gdpPercap), ~ as.numeric(.)) %>% dplyr::mutate(gdpPercap = ifelse(gdpPercap > median(gdpPercap, na.rm = TRUE), "high", "low"))
The default summary statistics table looks pretty good for only one line of code.
gap <- gap %>% select(-country) table1 <- tbl_summary(gap) table1
Customizing a Summary Statistics Table in R
We can also customize the table a bit by changing labels, adding some more summary statistics, and customizing some other things. In the example below,we are adding some more summary statistics, renaming the variables, making the labels bold, and modifying the header as well.
gap %>% gtsummary::tbl_summary( label = list( continent ~ "Continent", year ~ "Year", lifeExp ~ "Life Expectancy", pop ~ "Population", gdpPercap ~ "GDP per Capita" ), type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c( "{median} ({p25}, {p75})", "{min}, {max}" ) ) %>% add_n() %>% bold_labels() %>% modify_header(label ~ "**Variable**")
Setting Themes for Summary Statistics Tables in R and Creating A Table By Group
With the gtsummary
package for summary statistics tables, we can also set a theme for the table. This is convenient when we have to create a lot of tables. We can set the controls of the table globally. With the theme below, I am adding summary statistics of my choice and I am formatting how the numbers are displayed in the summary statistics table. We can then set the theme with gtsummary::set_gtsummary_theme(my_theme)
. Next, we are displaying the summary table by a group, continent.
my_theme <- list( "tbl_summary-str:default_con_type" = "continuous2", "tbl_summary-str:continuous_stat" = c( "{median} ({p25} - {p75})", "{mean} ({sd})", "{min} - {max}" ), "tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)", "style_number-arg:big.mark" = "", "tbl_summary-fn:percent_fun" = function(x) style_percent(x, digits = 3) ) gtsummary::set_gtsummary_theme(my_theme) gap %>% gtsummary::tbl_summary( by = continent, missing = "always", missing_text = "Missing", list( year ~ "Year", lifeExp ~ "Life Expectancy", pop ~ "Population", gdpPercap ~ "GDP per Capita" ) ) %>% add_n() %>% bold_labels() %>% modify_header(label ~ "**Variable**") %>% add_p()
If we are deciding to always have bold labels and p-values displayed in the summary statistics table, then we can create our own function to do so.
my_modified_gtsummary_tbl <- function(...) { gtsummary::tbl_summary( ... ) %>% add_n() %>% bold_labels() %>% modify_header(label ~ "**Variable**") %>% add_p() } gap %>% my_modified_gtsummary_tbl( by = continent, missing = "always", missing_text = "Missing", list( year ~ "Year", lifeExp ~ "Life Expectancy", pop ~ "Population", gdpPercap ~ "GDP per Capita" ) )
This is another way how you can extend your theme for your summary statistics table with the gtsummary
package.
Summary Statistics Regression Tables in R
The gtsummary
package also includes tables for summarizing regression tables (linear or logistic) and also survival output tables. The table below shows a linear regression table.
gap %>% lm(lifeExp ~ ., data = .) %>% gtsummary::tbl_regression()
Additional Resources
- Check out my blog post about many more summary statistics tables
- More documentation about the
gtsummary
package
I hope you enjoyed this short tutorial about summary statistics tables in R with the gtsummary
package.
Recent Posts
Recent Comments
- Kardiana on The Lasso – R Tutorial (Part 3)
- Pascal Schmidt on RSelenium Tutorial: A Tutorial to Basic Web Scraping With RSelenium
- Pascal Schmidt on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications
- Gisa on Persistent Data Storage With a MySQL Database in R Shiny – An Example App
- Nicholas on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications