Data Science is a hot field and a Data Scientist has been titled as the sexiest job in the 21st century. A lot of people are transitioning into the field and want to work with data. But what does it take to become a Data Scientist?
I this blog post we want to answer that question. In order to that we scraped job data from Indeed’s website with a scraper we developed in our last post. We looked for the job title Data Scientist in Canada. The cities we scraped data from are Vancouver, Toronto, and Montreal.
Despite having only looked in Canada, the skill set transfers to other countries as well and is not limited to Canada.
In order to answer the question in as much detail as possible, we posed the following questions:
- What level of education is required for a Data Scientist? (Bechelor’s, Master’s, or PhD?)
- What majors are best suited to become a Data Scientist? (Computer Science, Statistics, Physics etc.?)
- What technical skills does a Data Scientist need Part 1? (Python vs. R vs. SQL etc.?)
- What technical skills does a Data Scientist need Part 2? (Machine learning, AI, statistics etc.?)
- What soft skills do Data Scientists need? (communication, teamwork etc.?)
- How many years of experience is needed to enter the fied of Data Science?
The last two questions we posed are not directly related to “How to become a Data Scientist” but are still interesting.
- What companies are hiring Data Scientists?
- What job titles were the most common ones?
All analysis was done in R. If anyone wants to reproduce the results then the code is on my Github account. Every section starts with my personal experience, which I have gained in the Data Science field. Afterwards, we’ll have a more objective look and see what Indeed has to say.
Let’s jump into our analysis.
What Level of Education is Required to Become a Data Scientist?
Usually, people think that Data Scientists have to have a PhD in order to get the job done. In my experience, that is not the case.
It also depends on you area of study. As a Computer Science major, a Bachelor’s degree is in most cases enough. For other majors like Statistics, people often want to see a Master’s degree.
One of my co-workers has a PhD in Statistics and once said that having a PhD did actually hurt them. After their Master’s, they felt very comfortable with all the skills required in data science. Their skill set was very broad and my co-worker felt well equipped. However, during their PhD my co-worker went really deep into one specific area.
So, they could answer all questions of their particular research subject but all other topics in Statistics and Computer Science were left behind. That also meant that there was no time to explore new methods or technologies while doing their PhD. Therefore, I think that doing a PhD should be well thought out. You won’t necessarily become a better Data Scientist, you’ll only become the expert in one specific area.
Obviously, if you want to work in academia, a PhD is a requirement. However, almost all jobs outside of academia do not necessarily require or need a PhD.
Of course, this is only one data point but I also experienced the same opinion with my TA in university who was a PhD student. He is developing statistical methods for high dimensional spatiotemporal data in order to better understand problems related to basketball and other sports. He told me that he wished to do more machine learning in order to stay on top of the game. However, he is so involved in his area of research that all other areas of Statistics are not being practiced. Again, he’s going deep into one are and becomes an expert. However, other statistical methods are being neglected.
Enough of my experience. Let’s see what Indeed has to say.
It looks like the most desired degree level is a Master’s degree. Right after employers want you to have a PhD and a Bachelor’s degree is in last place.
Often, in job description it asks for Master’s degrees or higher. Therefore, we subtracted Phd’s when in the job description also mentioned a Master’s degree. Let’s see how it changed.
Wow, a lot of job descriptions mention that having a Master’s degree is enough. Now we have PhD degrees in last place.
It looks like Bachelor’s and Master’s degrees are sufficient to become a Data Scientist.
What Should You Study/Major in to Become a Data Scientist?
I don’t know what the best major is to become a Data Scientist. I only know that all the people I know in the field have a quantitative background (e.g Statistics, Computer Science, Mathematics).
In my opinion, having a quantitative degree, a good work ethic and a learners mindset suffice for a career in Data Science. There are s many different methods that it is impossible for any degree to cover the depth of Data Science. Therefore, it is more important to be adaptable and to be a quick learner.
Let’s see what kind of degrees Data Science positions demand.
I would’t have expected to see engineering up there. What I can say is that it is definitely over represented because we did not account for words like feature engineering or reverse engineering. However, going through around 20 job descriptions myself I can say that a lot of jobs are mentioning that a degree in any kind of engineering field is desired. So after all, it is definitely a degree most job descriptions mentioned.
Computer science is mentioned the second most, followed by statistics and mathematics. What is interesting is to see that Finance was mentioned 39 times and Economics 28 times out of around 600 job postings. That shows that really any degree that has some quantitative material in its degree can be used to become a Data Scientist. I almost majored in Economics and the degree was far from quantitative in comparison to Statistics, the degree I majored in university.
What Technical Skill Set Do You Need to Become a Data Scientist? (Part 1)
This is a really hard question to answer since there are so many different job requirements in Data Science. Hence, every job requires different technical skills. When you are applying for jobs as a Data Engineer, you probably want to be familiar with SQL, Hadoop, Java, and Spark. As a Data Scientist, it is important to know R or Python (best to know both), SQL, and maybe Java or C++.
At my workplace, we solely use R, Terminal, and Version Control for our projects.
Let’s see what Indeed says:
Programming in Python is by far the most desired skill to have. I think this is because it is a multi purpose language and a lot of software engineers who are transitioning into Data Science feel more comfortable using Python instead of R. R is mostly used by academics and people with a statistics background. Both languages have great functionality when it comes to working with data.
SQL is in third place and Version Control (Git, Bitbucket) in fourth place. When you are just starting out I would recommend, you picking either R or Python, then learn Git, and then a bit of SQL. The most difficult thing will be to become a good programmer and use R or Python well. Git and SQL can be picked up easily within a few weeks.
If we had looked for Data Analyst positions, then I believe that Excel would have been in first place. If we had looked for Data Engineer positions, then Hadoop, Scala or Java would be way up there. In fact, I looked at the differences between a Data Analyst, a Data Scientist, and a Data Engineer in this blog post.
What Technical Skill Set Do You Need to Become a Data Scientist? (Part 2)
The top skill every Data Scientist needs is to be able to program well. In place 2 there is algorithms. This skill can be attributed to machine learning algorithms and in general problem solving skills. It is followed by machine learning, modeling, deep learning, and neural networks.
Not only does a Data Scientist need to be good at using certain technologies, they also have to be excellent programmers and have to know the theory behind machine learning algorithms. This makes this job so hard. Excellent programming and statistics knowledge are hard to acquire and have to be build up over time.
What Soft Skills Do You Need to Become a Data Scientist?
In my experience, communication is the most important soft skill of a Data Scientist. Presenting data and explaining what conclusions follow from it is very important. When talking to a more technical audience, the interpretation of models and other statistical output is also crucial. Furthermore, explaining why a statistical method is used over another one and why it works better is also important. In short, without good communications skills a Data Scientist won’t ever reach their full potential (money and position wise).
Let’s have a look at Indeed’s posts.
Being creative and being able to solve problems were the two most desired skills. They probably mean the same thing and can be grouped together as problem solving.
The skill that we were ranking number 1 only made third place. It seems employers seem to care more about solving problems in a creative way than communicating them.
How many Years of Experience is Necessary Before Getting a Job in Data Science?
We only asked this questions out of interest. Many job descriptions include a sentence which states how many years of experience is desired. Rarely, it has any significance for applicants. So remember, as long as you get the job done you are qualified.
Indeed says that 3+ years of experience are most common in job descriptions. Followed by 5+ years, 2+ years, and 10+ years of experience. The 100+ years of experience were probably a typo in the description.
In conclusion, to become a successful Data Scientist, one needs a lot of experience. However, even people with only little experience can contribute greatly to projects. So don’t be discouraged if you haven’t had a lot of exposure with data yet. Keep learning!
What kind of Companies Are Hiring Data Scientists?
Because we are in Canada, this graph does not include Facebook, Apple, Microsoft, or Google. Especially in Vancouver there are a lot of health related Data Science jobs so it makes sense to see STEMCELL TEchnologies as number 1. Amazon and IBM came 8th and 9th place.
Most Mentioned Data Science Job Titles
Most mentioned jobs were unsurprisingly Data Scientist, followed by Data Engineer and Senior Data Scientist.
Lastly, we created a workcloud of the most common words in the job descriptions outside of the words we have already searched for.
I hope you have enjoyed this blog post. If you have any feedback or questions, please let me know in the comment section below.
You might also be interested in my Data Science internship experience and what kind of technologies I used at my work place.