Project for ODC / Instant Data Analysis Hackathon
This repository contains the work our team completed during the hackathon. We combined and enriched an Airbnb dataset for European cities, cleaned and filled missing values using web scraping, organized the cleaned data in a normalized database, created interactive dashboards, and built prediction models to recommend cities for listings.
- Data consolidation & feature enrichment (
final_df2.csv) - Data cleaning & null-filling via web scraping
- Database schema, DDL & bulk loading (
Database/) - Dashboards (Tableau + Power BI)
- Prediction notebooks using LightGBM and a neural network (
predictor.ipynb) - Correlation & exploratory visualizations (
images/)
final_df2.csv— Final cleaned and enriched CSV (output of Day 1)Copy_of_ODC_&_Instant_Data_analysis_Hackathon.ipynb— Day 1 exploratory notebook and preprocessingpredictor.ipynb— Prediction notebook (LGBMRegressor + neural network)Database/— SQL DDL, ERD, bulk import scripts, and SQL queries (Day 2)images/— Visualizations, correlation matrices, and dashboard screenshots
Goal: Combine all files from the Kaggle dataset and enrich the dataset with external features and geolocation improvements.
What we did:
- Combined all files from the Kaggle dataset:
Airbnb prices in European cities (Kaggle) - Improved location accuracy using
reverse_geocoderwith the existing longitude and latitude, and standardized country codes usingpycountry. - Added external country-level features:
- Cost of Living indicators from "myrios/cost-of-living-index-by-country-by-number-2024"
Cost of Living Index,Groceries Index,Restaurant Price Index
- Happiness & economy metrics from "unsdsn/world-happiness"
happiness score,Economy (GDP per capita)
- Cost of Living indicators from "myrios/cost-of-living-index-by-country-by-number-2024"
- Filled null/missing values using web scraping with BeautifulSoup from the World Happiness Report pages:
- Final combined and cleaned output:
final_df2.csv - Exploratory visualizations and two correlation matrices are shown below.
Correlation with guest satisfaction:

Effect of boolean features on numeric features:

➡️ See Copy_of_ODC_&_Instant_Data_analysis_Hackathon.ipynb for step-by-step preprocessing, enrichment, and cleaning code.
Goal: Organize the enriched dataset into a normalized relational schema and run analytical SQL.
What we did:
- Designed an unnormalized ERD to analyze redundancy and normalization benefits.
- Created a normalized database schema (DDL) and implemented tables with appropriate constraints:
- Primary keys
- Foreign keys
- Unique constraints
- Not-null constraints where applicable
- Performed bulk insertion of
final_df2.csvinto the database to handle large size and mixed data types reliably. - Implemented joins and wrote the requested analytical SQL queries:
- Created tables with constraints
- Integrated datasets using correct joins
- Wrote analytical SQL queries to answer business questions
- Created at least one VIEW
- Used Common Table Expressions (CTEs)
- Used window functions for advanced analysis where applicable
- All database-related files, DDL, ERD diagrams, and SQL query scripts are located in the
Database/directory.
Goal: Visualize insights and build prediction models.
predictor.ipynbcontains two modeling approaches:- Regression using
LGBMRegressorto predict real sum (price). - Neural network (Adam optimizer) using:
real sumnumber of bedroomsperson capacity
to produce top-5 cities where an Airbnb with those specs exists (recommender-style output).
- Regression using
final_df2.csv— Final cleaned dataset (Day 1 output)Copy_of_ODC_&_Instant_Data_analysis_Hackathon.ipynb— Main processing & EDA notebookpredictor.ipynb— Modeling and recommenderDatabase/— Schema, DDL, bulk import, SQL queriesimages/— All visualization screenshots (displayed above)
- Original Airbnb dataset:
Airbnb prices in European cities (Kaggle) - Cost of living indicators: "myrios/cost-of-living-index-by-country-by-number-2024"
- World happiness metrics: "unsdsn/world-happiness"
- World Happiness Report pages used for scraping:
World Happiness Report (Wikipedia)
Please ensure you comply with the terms of use for each data source when sharing or publishing results.
- Added country-level economic and quality-of-life features that provide additional predictive power for price and guest satisfaction analyses.
- Correlation matrices (
images/) show relationships between price, guest satisfaction, and the influence of boolean features on numeric features. - The recommender model (neural network) can list top 5 candidate cities for a given price, bedrooms, and capacity.

