Welcome to my GitHub repository for The Knowledge House 2025 Data Science Innovative Fellowship! I am honored to be part of Cohort A in the Data Science Track and will use this space to document my journey over the 9-month program through hands-on projects, data pipelines, and analytical problem-solving.
I was recently accepted into The Knowledge House's highly selective Innovative Fellowship, where I’ll spend the next 9 months mastering Data Science, AI/ML, and Business Analytics. This repository serves as both a living portfolio and engineering journal, reflecting my technical growth, creativity, and problem-solving mindset.
- Program: The Knowledge House Innovative Fellowship
- Track: Data Science
- Cohort: A
- Timeline: 2024 – 2025
- Location: Remote + NYC-Based Initiatives
Goal: Build an automated sentiment analysis tool using the gpt-4o-mini model to label reviews as positive, neutral, negative, or irrelevant.
Skills Used:
- OpenAI API
- JSON parsing
- Test-Driven Development (TDD)
- Prompt Engineering
- Data Visualization with
matplotlib
Highlights:
- Created a flexible pipeline using
label.py,main.py, andvisualize.py - Developed a custom system prompt for accuracy across human tone variance
- Used bar charts to summarize review sentiment distribution
Goal: Analyze 8-hour heart-rate samples from wearable devices to identify sleep and exercise patterns.
Skills Used:
- Data Cleaning with For-Loops
- Manual Statistical Calculations (avg, max, std dev, variance)
- Matplotlib Visualization
- File I/O and error handling
- Real-world simulation of a health-tech data pipeline
Highlights:
- Cleaned inconsistent and non-digit heart-rate values
- Built metrics without using NumPy or SciPy
- Created time-series plots of heart-rate for different activity phases
- Python (core + data science)
- Data Cleaning & Wrangling
- Exploratory Data Analysis (EDA)
- Statistical Modeling
- Prompt Engineering & LLMs
- Data Visualization (Matplotlib, Seaborn, Plotly, Tableau)
- APIs (OpenAI, Kaggle, Polygon.io)
- Git/GitHub + Conda Environments
- TDD (Test-Driven Development)
- Build a robust, public-facing data science portfolio
- Master applied AI/ML pipelines and statistical problem-solving
- Publish a technical blog and newsletter (coming soon!)
- Contribute to social impact through data storytelling
- 🌐 Portfolio Website
Thank you for visiting my repository! I look forward to growing as a data scientist and community-minded innovator during this fellowship. 🚀
