How to use Data Science to predict if an H1B petition would be certified, withdrawn, or denied?

This case study explores the application of Data Science to predict if an H1B petition is certified, withdrawn, or denied.

This case study was brought to you by Case Studies @ Data Kapoor™. Explore one click implementation catered to your use case only at Case Studies @ Data Kapoor.

Photo by Scott Graham on Unsplash
  • Employers first must attest, on a labor condition application (LCA) certified by the Department of Labor (DOL), that employment of the H-1B worker will not adversely affect the wages and working conditions of similarly employed U.S. workers
  • Employers must also provide existing workers with notice of their intention to hire an H-1B worker.

Problem Statement

Tools Required

  • Python 3
  • PySpark
  • Pandas
  • Seaborn
  • Scikit-Learn
  • NumPy

Data Sources

  • CASE_STATUS: The case status
  • EMPLOYER_NAME: Employer Name
  • SOC_NAME: Job Title Category
  • JOB_TITLE: Job Title
  • FULL_TIME_POSITION: Is this a full-time position
  • PREVAILING_WAGE: The prevailing wage
  • YEAR 0 WORKSITE: Work location
  • lon: Longitute
  • lat: Latitude
Dataset Image

Exploratory Data Analysis

Data Preprocessing

  • CASE_STATUS
  • EMPLOYER_NAME
  • SOC_NAME
  • JOB_TITLE
  • FULL_TIME_POSITION
  • WORKSITE
Demonstration of the label encoding for one of the columns
Final Dataset

Guess what? Case Studies @ Data Kapoor™ frequently manages such client cases/use cases where there is major class imbalance issue. If you have a similar use case and are struggling, we are always there for you!

Class Count
The prevailing wage has gone down as the years proceed. Perhaps, we can categorize this phenomenon as the abuse of H1B by certain companies.
Over the years, we also see the CASE_STATUS (Certified) has gone up
Over the years, we also see the CASE_STATUS (Denied) has gone down. For CASE STATUS (Rejected), we only have one case.

There is a difference between “rejection” and “denial” in the immigration world. A rejection simply means that there was an error with your filing or fee payment that can be corrected. A denial occurs when either you or your employer are not considered qualified for an H-1B — Reference

Prediction Models

  • Random Forest
  • XGBoost

Feature Selection

(2895144, 9) -> (2895144, 3)

Train, Test, Validation Splits

Random Forest

Note: We do have slight overfitting

XGBoost

Note: There is no overfitting

Case Study Observations and inferences

Why is this prediction problem relevant?

This prediction problem solves the issue of anticipation.

Code

This case study was brought to you by Case Studies @ Data Kapoor™. Explore one click implementation catered to your use case only at Case Studies @ Data Kapoor.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store