Using data to determine a COVID state similarity matrix to estimate policy effects on similar states and “high risk” states

Data has always played an integral part in analyzing and proving hypotheses. With the advent of highly optimized and easy-to-use frameworks, data is collected almost every second. The world’s most valuable resource is no longer oil, but data (The Economist). It is estimated that by 2025, about 463 exabytes (one exabyte holds 50,000 years of DVD quality video) of data would be generated each day.

John Snow, an English physician and one of the founders of modern epidemiology, in the 1800s discovered the source of Cholera was contaminated public water pump. …

Data Science, Data Visualization

Finding common ground: US Presidents State Analysis


We aim to find any correlation between a US President and its state concerning other Presidents. We use data from different sources to form a dataset that comprises different US presidents and their home states with additional facts about the state. We pull data from multiple sources and combine it to form a comprehensive dataset. We also find the most common state and ultimately plot the results on a live interactive map.


  • Python 3.8
  • Pandas
  • NumPy
  • Folium
  • Seaborn

Data Preparation and Preprocessing

We start by preparing the data to comb the datasets from different sources and perform some basic preprocessing on it.

  • We change…

Debate Analysis using Data Science: Using YouTube Comments to find the true intent of voters

Image Source


I believe Data Science allows me to express my curiosity in ways I’d never imagine. The coolest thing in Data Science is that I see data not as numbers but as an opportunity (business problem), insights(predictive modeling, stats, and data wrangling), and improvement (metrics). With this thought in mind, I decided to analyze the YouTube Comments of VP and Presidential debates.

After getting mixed results from the news sources, I thought to analyze the Vice Presidential and Presidential debates using Data Science.
The idea is to use YouTube comments as a medium to get the sentiment regarding the debate and…

Why is your Facebook Data so valuable? A method to predict humans traits (gender, political preference, age) through Facebook Likes.

As the election time is approaching we will see how and why our Facebook Data is so valuable to advertisers, politicians. Facebook is the world's largest social network platform with over 2.5 billion active users. It processes data at a scale never seen before, the highly sophisticated Facebook A.I algorithms curate, categorize, and predict associations between data in an almost human way.

How and Why

Why: Due to the influx of such a vast amount of data and processing power, we will explore how we can predict human traits using just a collection of Facebook Likes. To achieve our results, we will try…

SigNet (Detecting Signature Similarity using Machine Learning/Deep Learning): Is this the end of Human Forensic Analysis?

My grandfather was an expert in handwriting analysis. He spent all his life analyzing documents for the CBI (Central Bureau Of Investigation) and other organizations. His unique way of analyzing documents using a magnifying glass and different tools required huge amounts of time and patience to analyze a single document. This is back when computers were not fast enough. I remember vividly that he photocopied the same document multiple times and arranged it on the table to gain a closer look at the handwriting style.

Handwriting analysis involves a comprehensive comparative analysis between a questioned document and the known handwriting

Quora Similar Questions: Detecting Text Similarity using Siamese networks.

Ever wondered how to calculate text similarity using Deep Learning? We aim to develop a model to detect text similarity between texts. We will be using the Quora Question Pairs Dataset.


  • Python 3.8
  • Scikit-Learn
  • TensorFlow
  • Genism
  • NLTK


Let us first start by exploring the dataset. Our dataset consists of:

  • id: The ID of the training set of a pair
  • qid1, qid2: Unique ID of the question
  • question1: Text for Question One
  • question2: Text for Question Two
  • is_duplicate: 1 if question1 and question2 have the same meaning or else 0

A supervised or semi-supervised ULMFit model to Twitter US Airlines Sentiment Dataset

Our task is to apply a supervised/semi-supervised technique like ULMFit (Ruder et al, 2018) to the Twitter US Airlines sentiment analysis data.
The reason this problem is semi-supervised is that it is first followed by an unsupervised way of training then fine-tuning the network by adding a classifier network at the top of the network.

We use the Twitter US Airlines dataset (

We will start by:

  • Exploring the dataset, preprocessing and preparing it for the model
  • Exploring a bit of history in sentiment analysis
  • Exploring Language Models and why they are important
  • Setting the baseline model
  • Exploring the techniques…

OmniNet: If Ben’s Omnitrix had a better Machine Learning/Artificial Intelligence system inbuilt?

I am a big fan of the Ben 10 Series and I have always wondered why Ben’s Omnitrix fails to change into an alien that Ben chooses to be(This is largely due to a weak A.I system already built into the watch). To help Ben, We will devise “OmniNet”, a neural network capable of predicting an appropriate alien according to the given situation.

As discussed on the show, the Omnitrix is basically a server that connects to the Planet Primus to harness the DNA of around 10000 aliens! …

Privacy and Security, Technology

Photo by Carlos Muza on Unsplash

It is the 21st century, technology is on the rise, the internet has succeeded paper texts. We live in a world that is interconnected. In this fast-paced, growing world, data is being rapidly created every second. The use of algorithms and statistical measures allows us to graph each movement in a way that is acceptable for predictive modeling.

Big data refers to huge amounts of data accumulated over time through the use of internet services. Traditional econometrics methods fail when analyzing such huge amounts of data and we require a host of new algorithms that can crunch this data and…

My biggest mistake in the last 2 years has been to avoid the fact that all great things start from the fundamentals. In this fast-paced world, we tend to forget the most basic, minute details about a technology or whatever we are creating. I have also encountered this strange behavior of “over innovating” on a product that is creating a product just for the sake of opening up something. In my humble opinion, need drives innovation that in turn drives an idea cultivated into a product. Another mistake I have made in these years is following up on a trend…

Aadit Kapoor

Imagination is more important than knowledge — Einstein

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store