Note: the opinions and views expressed in this article are completely my own and do not reflect that of my employer or past employers.
In the last couple years, data science has grown tremendously at UC Berkeley. More than 500 students have now graduated with a data science major within 2 years of the major being released in Fall 2018, likely putting it in the top 5 most popular majors at Cal. At the same time, data science’s footprint has permeated throughout campus, with Data 8 becoming integrated into other majors as pre-requisites. Data 8 and Data 100 are also some of the largest classes; their waitlists are so large that they alone could fill Dwinelle 155.
I thought I’d write this article to discuss some personal and stylized thoughts about pursuing the data science major. This is something I’m often asked about, and I’ve noticed that I end up giving more or less the same spiel every time. That being said, you should take everything I say with a huge grain of salt: everyone has a different take on this, and I’m not even a data science major, after all. I further want to caution that not everything I say will apply to you; exceptional people will always be, well, exceptions to the conventional wisdom.
Data science is meant to be a double major
I remember that my primary takeaway after taking Data 8 was how data science could be applied to so many different domains. In the class, we conducted hypotheses tests to validate ‘deflate-gate’, explored the age of the universe through basic linear regression, and predicted breast cancer through the k-Nearest Neighbors algorithm. The multitude of different domain emphases offered by the data science major are a further testament to this.
One of the coolest things about data science is that it can complement virtually any field of study, from English to Political Science to Public Health (surprised? check out the different modules and connectors that the Data Science Education Program works with different departments to offer). Many academic disciplines are very quantitative in nature; in Economics, you begin to realize that the entire field is grounded in empirical evidence and thus relies on heavy mathematical and technical expertise (side note, that’s why we created Data 88: Economic Models). As a potential student looking for research, knowing data science and having some coding background will really set you apart from other applicants.
Professor Denero likes to show this Venn diagram at the start of Data 8 that puts data science at the intersection of Computer Science, Statistics, and Domain Knowledge. Only with the relevant domain knowledge does the data science you do become meaningful and applicable. For example in business applications, data scientists don’t just optimize to reduce some loss function but tie it to business value, perhaps how much more revenue is generated or how much time is saved. Data science without domain knowledge is arguably irresponsible; models lose their much-needed interpretability and people become just statistics.
Sure, there are a bunch of domain emphases as part of the data science major that force data science majors to pick a domain of specialization. However, taking 2 courses will likely only scratch the surface of an entire domain.
Data Science vs Computer Science
Many people quip that data science is the “new cognitive science” in that it is the new backup major for students who failed to declare CS. As mean as this may sound, there is some truth to this statement.
If you are set on becoming a data scientist or machine learning scientist, major in CS and statistics, math, data science, economics, or another heavily quantitative subject. You’ll probably also need at least a masters degree along the way.
One thing I would like to advise all freshmen who are interested in being data scientists is to at least try to major in computer science. Having a solid engineering and computer science background will allow you to become a better data scientist, and the CS curriculum at Cal has plenty of leeway for you to develop as a data scientist. Also, computer science is a much more valuable and versatile degree in industry, especially in tech. More on that later.
Give CS 61A your absolute best: grind out practice exams, spend a lot of time to truly understand recursion, work with a reliable group… you’ve probably heard this before. After taking CS 61A, you’ll have a better sense of whether programming is for you and how comfortable you would be to declare CS. CS 61A may give you a run for your money as the hardest course you’ll have taken (so far), but it’ll at least serve as a major declaration requirement when declaring data science even if things do not work out. Plus, taking a course isn’t always about the GPA, you really do learn a lot from the 2 intense midterms, 3 programming languages, and 4 interesting projects (please do not put these on your resume, if you can).
Do not be discouraged by the reputation of how hard Berkeley’s CS program is. I know a few friends who were intimidated by CS 61A and decided to avoid it in their freshman and sophomore years; only when they decided to add a CS major later on did they realize that CS 61A/B and 70 weren’t actually that bad. In my experience, if you are willing to put in the time and effort to understanding things, you can do well in these courses.
That being said, if you know you’re more interested in something else (like economics) and mainly want to utilize data science in your field of study, a data science major is perfect for you.
Finding a job as a data science major
Finding a job as a data science major in tech is pretty tough.
All things equal, data science majors are generally considered less favorably than CS majors for software engineering roles. In fact, I wouldn’t even be surprised that data science majors are considered less favorably than CS majors for data science roles.
Overall, there are significantly far fewer data scientist positions as software engineering ones. Most tech companies simply do not need to employ as many data scientists as they do engineers; last year, a certain Bay Area company had around 20 data science interns but 100+ engineering interns, while another only had 1 data science intern out of more than 50 interns. Without any priors, these odds do not look good.
On top of that, data scientists have traditionally required much more rigorous training and schooling: you’ve probably heard that a master’s degree or perhaps even a PhD is a common requirement. A lot of the data science being done especially at the forefront of a field requires a non-trivial amount of mathematical and research maturity to be conducted thoughtfully. Data science teams are often more academic in nature, reading papers and going to conferences. In the same company example above, only 1 out of the 20 or so data science interns were actually undergrads.
As a side note, I’d recommend pursuing a master’s degree if you really want to do data science in industry. This is not as much for the learning, but more so for the credential. Lots of well known universities have decided to cash in on this opportunity, so you’ll have plenty of options to pick from. Ultimately, do you need a masters? Absolutely not; but having one probably would open a few doors.
That being said, many data scientist responsibilities in industry are nowhere near as advanced as those I’ve described above; think more along the lines of visualization, data analysis, and basic modeling. In theory, a data science degree from UC Berkeley would be more than adequate to succeed in these roles, but my impression is that the labor market is very saturated with relatively low demand but a high supply. People from all sorts of industries and disciplines are entering this field deemed by the Harvard Business Review as “the sexiest job of the 21st century”; some with PhDs, many with masters degrees, and some even from bootcamps. On top of that, the data science major is so new that the industry is not yet prepared nor have the need to hire data science majors to do data science.
The outlook is not dim though. I believe that there will be more data scientists in the future, just like how CS degrees used to be much more niche and required a graduate degree 30 years ago. In fact even today, there are plenty of data science roles in non-tech companies that a data science major would have a good chance at getting. These roles are not as sexy or on the frontiers of data science, nor pay as well as those in tech companies. For some companies, these roles have always existed, just under different names like ‘business analyst’ or ‘industrial engineer’. For others, the exponential increase in the amount of data these companies have collected and stored now leads to a wealth of potential untapped data insight. Non-tech companies in fact often struggle to attract good talent in engineering and data analysis.
I don’t want to imply that these roles are by any means inferior to data science roles in tech. Data science is essential to probably every company, and I’d argue that one’s impact on one of these teams may be much higher than that in tech.
An Afterword
This article is a culmination of experiences and thoughts I’ve picked up during the last couple years of being involved in data science as an undergraduate and graduate student at UC Berkeley. None of the things stated here should be treated as facts; they’re more ‘takes’ that you should not take wholeheartedly.
I do not intend to imply, in any way, how the data science major is inferior to the computer science major. Rather, the status quo of computer science at Berkeley and in tech has created this unfortunate juxtaposition.
Ultimately, compared to many other majors at Cal, the data science major is far ahead in both career prospects and its general applications. Berkeley has some of the most interesting, innovative, and straight-up brilliant data science courses in the world, and I could not recommend enough for anyone to check them out. However, whether you need only a data science major to do data science–I’ve come to realize–is not necessarily the case.