The Tidyverse Packages – R for Data Science Book Review

I recently got my first internship in Data Science and I am learning a ton. I am getting exposed to so many new things every single day. Sure, I have been to university for about three years as a Statistics major. So, maybe I knew a lot already. However, the amount I have learned so far at my internship greatly exceeds the amount learned in university. At my current job I am using the tidyverse package so much that I bought the book “R for Data Science” by Hadley Wickham. I enjoyed reading it so much that I am doing a book review about it.

I am currently working in cancer research. At my workplace, we are using R for our analysis and I am amazed of what can be accomplished by the tidyverse. If you do not know what the tidyverse is then I’ll give you a brief overview. The tidyverse was developed by R Studio to make data analysis as easy as possible for analysts. It is a package consisting of other packages that are used for working with data. So, with a single command (library(tidyverse)) you’ll load the following packages:

The Tidyverse Core Packages

  • ggplot2, for data visualization
  • dplyr, for data manipulation
  • tidyr, for data tidying
  • readr, for data import
  • purrr, for functional programming
  • tibble, for a more modern representation of data frames
R for Data Science tidyverse

All of the above packages are building the core of the tidyverse. There are many more packages that are included in the tidyverse, however, these are less commonly used. If you want to look at the other packages as well, check out this blog post from R Studio about the tidyverse.

Why I Bought R for Data Science

As already mentioned, I am amazed by the tidyverse and I am using it around 95% of the time at work when programming. Every time I want to shape my data in a certain way, there is a tidyverse package/verb that can accomplish what I want to do. This is the reason why I am doing a book review for R for Data Science. R for Data Science is all about the tidyverse and a structured guide to becoming a good analyst and programmer. I picked this book up in the first few months of my internship and it has helped me a lot to become better at working with data.

Get From 0 to Hero

One great advantage of R for Data Science is that everyone with 0 programming experience can start learning. You don’t have to know about loops, if else statements, recursion, or even subsetting. The book will introduce you to the tidyverse packages and will explain each package. The amount of detail is not greatly in depth, however, it introduces you to the main functions. After you have read the book, you can certainly do 80% of what you want to accomplish.

Learn SQL and Workflows with R for Data Science

Besides teaching you a lot of packages which are indispensable for data analysis, there is also one chapter about relational data. The dplyr package has some SQL like commands and introduces you to the concept of foreign key and value key and about some joins. There are also some chapters about R projects and how to create a good work flow. These chapters will introduce you to how and where to store your code. This might not be an important chapter at first but becomes very crucial when working with people on your team with whom you want to share your analysis.

Communicate Your Analysis with R For Data Science

On top of the workflow and project chapters, there is also a section about how you can communicate your analysis effectively. This is also one of my favourite parts of the book. It shows you how to work with R markdown and introduces you to its format. In addition to that the book also presents other cool ways to present your analysis. It is mentioning what kind of dashboards you can use to present your results or how to make your visualizations more interactive.

Make Your Analysis Reproducible with R for Data Science

At the end of the book there is some great advice about reproducible research/analysis and what kind of things you can do to make your code work on other machines at any point in time.

In conclusion, the book gives a great overview about the different tidyverse packages. On top of that it will show you how to efficiently and effectively work with R and how to structure your projects. Moreover, it gives you an overview about what tools you can use to better communicate your results and what is out there. The communication section in the book is a bit short and does not teach you a lot. However, it is one of my favourite sections because it is a great overview of the tools available. If anything from the communication section peaked your interest, such as making visualizations with R Shiny, then I would encourage you to read up on it.

The Tidyverse is Opinionated

The tidyverse is also great because every function takes their arguments in the same spot. So, if you are unfamiliar with a function from a tidyverse but you know how others work, then you can quickly figure out where to place the arguments. This makes it also an opinionated package because it forces you to do data analysis in a certain way. So, all packages share the same underlying design philosophy, grammar, and data structures.

Who Should Read R for Data Science?

I would recommend the book to beginner and intermediate R users. It is a great reference and I think university students and professionals who have been working with R for up to 3 years can greatly benefit.

If you have read the book already, then you can let me know what you think about it in the comments below. If not, then tell me if you are thinking about reading it.

A free version is available here. Enjoy!

Post your comment