What exactly is Data Science?
Professionally being in the field of Data Science for a year (Well, humans are natural data scientists) I have seen and examined a lot of different perceptions about the field. If we think of Data Science then we see that there is a certain hype around the positions available in the market. A lot of us do not yet understand the true meaning of Data Science. I have seen that a lot of companies also are misguided and a lot of my peers also give in to that hype.
Well, there is indeed hype around Data Science. According to Harvard Business Review, Data Scientist is the sexiest job of the 21st century. Data Science is a very interdisciplinary field with a lot of applications in logistics, recommendation systems, entertainment, etc. Essentially, Data Science can be seen in every industry one way or another.
In this article, I would like to give my views on what I think Data Science is and most importantly why it has always been an important problem tool. I would show my readers how instead of a field, we can think of Data Science as an art and a way of thinking. With this, I would like to define a new term known as data science thinking which is a combination of first principle and second order thinking.
To start let us base this article on an assumption that when I talk about Data Science, I encapsulate all the available roles in the market such as:
- Data Analysts
- Data Scientists (Applied and Analytics)
- Data Curator/Data Annotator
- A.I Research Scientist
- ML Engineers/A.I Engineers/Analytics Engineer
Another assumption that we are considering is that we’d be looking at Data Science from a problem solving perspective as opposed to an educational perspective.
What is Data Science?
Data Science is an art but more specifically all data scientists or a person doing data science is a problem solver. On a wider spectrum, problems can be defined as of two types:
- Real World Problem: A real world problem is a problem that is solved to directly impact an end-user. These problems are highly efficient and they have to be scalable. These problems generally do not require new ideas or new formulas but they may require a new way of solving things or in data science, this is known as applied data science. (Examples include, Using data science techniques to increase grocery sales or using data science to increase the speed of an algorithm). These problems require technical as well as business skills.
- Research/Scientific Problem: A scientific problem is a problem that is solved to prove something. Typically, these problems do not have a direct business impact and they do not always solve a business problem. A lot of times, these problems once solved have to be reiterated to be able to put the technology in use. These problems are not always scalable and efficient. (Examples include, a novel algorithm, A new type of Neural Network, a formula). These problems require extremely good technical skills.
A data scientist is a problem solver that could be working to solve any one of the problems or a combination of the two. A data scientist’s job is to increase impact in an efficient and scalable manner.
Highly valuable skills for a problem solver
Apart from technical skills, an extremely good data scientist needs the following attributes:
- Curiosity: A data scientist should be extremely curious and should always seek on-demand knowledge
- Passion: A extreme passion for the field
- Critical Thinking: An ability to critically analyze and asks questions. If the question is correct then the answer is always correct
- Business or Domain Knowledge: Business knowledge in the field they are solving
- Organizational skills: Good organization skills. Good command on structuring notes, code, data, etc.
- Humility: Always being humble
- A focus on failing fast and reiteration and not being obsessed with perfection: Not to focus on perfection but to try to fail fast (without any fear) and trying new and bold things
- Second Order Thinking: Ability to seek “then what” for a given situation
While a good command of technical skills is extremely valuable, those skills can be taught but non-technical (as listed above) is difficult to teach.
How to Adopt the “Data Science Thinking”?
Data Science Thinking is a combination of second order and first principle thinking. In simple, what we do is basically break down a problem to its first principle and then for each subset of the problem, we apply second order thinking. This step ensures that we put extra emphasis on the questions.
“If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask… for once I know the proper question, I could solve the problem in less than five minutes.” — Einstein
This thinking also helps us to become better problem solvers. While I always used first and second-order thinking, I evaluated the use cases for the combination of the two.
What is Data Science? — According to me and what I think would be extremely beneficial for the universe
Data Science to me is all about solving real world problems in a scaleable manner. It is about adopting “data science thinking”. To me, Data Science is all about the business impact and solving new, difficult problems in a creative manner.
To me, failing fast and reiterating is extremely important because that is how we humans innovate. Imagination is something that makes us human and we should use that to solve real-world problems quickly.
Solving a Data Science problem using “Data Science Thinking”
Let us solve a data science (real world problem) using “Data Science Thinking”.
Problem Statement: What defines a popular song?
Available Data/Research: Given the problem statement, our job is to find out a way to quantitatively define what makes a song popular. Assuming, we have the data available, we first break down the problem into several subproblems:
- What is a Song?
- How to define a song?
- How to define a popular song? (Number of views, singer, lyrics, etc)
- Does the song consist of lyrics or is it just a tune?
- What is the platform of origin?
- Why do people play a song again and again? (Motivation, lyrics, or emotions)
- What is the emotional setting?
Data Science thinking involves asking a lot of questions to yourself from a first principle and as second order thinking methodology.
This is a similar process where we identify the most important parts of a story or movie and then make an effort to understand it.
From a data science perspective, we have to optimize on the most generic solution and this involves second-order thinking of asking the question that seeks “then what” of a given subproblem.
This is an example of how we can use Data Science thinking to solve a particular problem. Note that there may be many variations but the idea is to ask a lot of questions and narrow down to the most encapsulated one while performing second-order thinking.
We examined what exactly is Data Science and more specifically, we divided problems into two categories. While Data Science can be broad, the maximum benefit that could help push the human race forward is building solutions for real-world problems (Typically using or utilizing research papers/solutions, etc) in a creative manner. This requires an attitude of failing fast and reiteration with a clear focus on the business impact/metric.