What exactly is Data Science?



Well, there is indeed hype around Data Science. According to Harvard Business Review, Data Scientist is the sexiest job of the 21st century. Data Science is a very interdisciplinary field with a lot of applications in logistics, recommendation systems, entertainment, etc. Essentially, Data Science can be seen in every industry one way or another.

In this article, I would like to give my views on what I think Data Science is and most importantly why it has always been an important problem tool. I would show my readers how instead of a field, we can think of Data Science as an art and a way of thinking. With this, I would like to define a new term known as data science thinking which is a combination of first principle and second order thinking.


  • Data Analysts
  • Data Scientists (Applied and Analytics)
  • Data Curator/Data Annotator
  • A.I Research Scientist
  • ML Engineers/A.I Engineers/Analytics Engineer

Another assumption that we are considering is that we’d be looking at Data Science from a problem solving perspective as opposed to an educational perspective.

What is Data Science?

  1. Real World Problem: A real world problem is a problem that is solved to directly impact an end-user. These problems are highly efficient and they have to be scalable. These problems generally do not require new ideas or new formulas but they may require a new way of solving things or in data science, this is known as applied data science. (Examples include, Using data science techniques to increase grocery sales or using data science to increase the speed of an algorithm). These problems require technical as well as business skills.
  2. Research/Scientific Problem: A scientific problem is a problem that is solved to prove something. Typically, these problems do not have a direct business impact and they do not always solve a business problem. A lot of times, these problems once solved have to be reiterated to be able to put the technology in use. These problems are not always scalable and efficient. (Examples include, a novel algorithm, A new type of Neural Network, a formula). These problems require extremely good technical skills.

A data scientist is a problem solver that could be working to solve any one of the problems or a combination of the two. A data scientist’s job is to increase impact in an efficient and scalable manner.

Highly valuable skills for a problem solver

  • Curiosity: A data scientist should be extremely curious and should always seek on-demand knowledge
  • Passion: A extreme passion for the field
  • Critical Thinking: An ability to critically analyze and asks questions. If the question is correct then the answer is always correct
  • Business or Domain Knowledge: Business knowledge in the field they are solving
  • Organizational skills: Good organization skills. Good command on structuring notes, code, data, etc.
  • Humility: Always being humble
  • A focus on failing fast and reiteration and not being obsessed with perfection: Not to focus on perfection but to try to fail fast (without any fear) and trying new and bold things
  • Second Order Thinking: Ability to seek “then what” for a given situation

While a good command of technical skills is extremely valuable, those skills can be taught but non-technical (as listed above) is difficult to teach.

How to Adopt the “Data Science Thinking”?

“If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask… for once I know the proper question, I could solve the problem in less than five minutes.” — Einstein

This thinking also helps us to become better problem solvers. While I always used first and second-order thinking, I evaluated the use cases for the combination of the two.

What is Data Science? — According to me and what I think would be extremely beneficial for the universe

To me, failing fast and reiterating is extremely important because that is how we humans innovate. Imagination is something that makes us human and we should use that to solve real-world problems quickly.

Solving a Data Science problem using “Data Science Thinking”

Problem Statement: What defines a popular song?

Available Data/Research: Given the problem statement, our job is to find out a way to quantitatively define what makes a song popular. Assuming, we have the data available, we first break down the problem into several subproblems:

Sub Problems:

  1. What is a Song?
  2. How to define a song?
  3. How to define a popular song? (Number of views, singer, lyrics, etc)
  4. Does the song consist of lyrics or is it just a tune?
  5. What is the platform of origin?
  6. Why do people play a song again and again? (Motivation, lyrics, or emotions)
  7. What is the emotional setting?

Data Science thinking involves asking a lot of questions to yourself from a first principle and as second order thinking methodology.

This is a similar process where we identify the most important parts of a story or movie and then make an effort to understand it.

From a data science perspective, we have to optimize on the most generic solution and this involves second-order thinking of asking the question that seeks “then what” of a given subproblem.

This is an example of how we can use Data Science thinking to solve a particular problem. Note that there may be many variations but the idea is to ask a lot of questions and narrow down to the most encapsulated one while performing second-order thinking.


Linkedin: https://www.linkedin.com/in/aaditkapoor1201/



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store