You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To clean and validate the data extracted from USCIS website
Create a data model based on the dataset
Create a database in Neo4j and load the data using Cypher queries
Create a data pipeline for connecting Neo4j to Python
Build an interactive dashboard for better insights
Extract Metadata from Neo4j database and load it to SQL Server database
Integration and Acceptance testing for data validation
Data Overview using Pandas Profiling
This Dataset gives detailed information of around 374K visa applications and its decision.
Data covers 2011-2016 and includes information on employer, position, wage offered, job posting history, employee education and past visa history, and final decision.
we can analyze that the dataset has 374362 observations out of which 373025 are unique observations. The dataset has 154 variables out of which only 21 variables have more than 330000 non-missing observations.
The Dataset has,
116 Categorical values
2 Date Time values
10 Numerical values
26 Boolean values
Technical Vision Diagram
Graph Data Model
Database Schema in Neo4j
Interactive Dashboard
Target Audience
US Citizenship and Immigration Services
Corporates of different sectors
Immigrants applying for US Visa
Dashboard Insights
We found that H-1B is the top visa application that is applied through the different companies and has most approved visas.
Amazon is amongst top 5 companies that file the highest number of visa applications.
Computer Engineering is the hottest job for which companies are filling visa application and has highest rate of approval.
India is the country with the most visa applications filed throughout the world and has the most approved cases.
About
In this project we have cleaned and processed the data extracted from USCIS website, which includes all the details and information for US Visa applications from year 2011-2016. Then have created a data model based on the dataset and using which created a database in Neo4j, which is a graph database and best for problem-solving and analysis. The…