Task 5 β Train-Test Split & Model Evaluation π Overview This repository contains the implementation of model training and evaluation on the Heart Disease dataset as part of an AI & ML Internship task. The goal of this task is to understand how machine learning models are evaluated using proper data splitting and performance metrics.
β€οΈ Dataset Information Dataset: Heart Disease Dataset
Problem Type: Binary Classification
Target Variable: Indicates presence (1) or absence (0) of heart disease
Features: Medical attributes such as age, cholesterol, blood pressure, etc.
π― Objective The objective of this task is to:
Split the dataset into training and testing sets
Train a classification model
Evaluate performance using accuracy, precision, recall, and confusion matrix
π Tools & Libraries Used Python
Pandas
NumPy
Scikit-learn
βοΈ Steps Performed Loaded the dataset using Pandas
Separated features (X) and target (y)
Split data into 80% training and 20% testing
Trained a Logistic Regression model
Made predictions on test data
Evaluated model using:
Accuracy
Precision
Recall
Confusion Matrix
Classification Report
π Evaluation Metrics Metric Meaning Accuracy Overall correctness of predictions Precision How many predicted positive cases were actually positive Recall How many actual positive cases were correctly identified Confusion Matrix Shows TP, TN, FP, FN values F1-score Balance between precision and recall
π Key Insights Logistic Regression performed well on the dataset
Model shows balanced precision and recall
Confusion matrix helps understand prediction errors
Train-test split ensured the model generalizes to unseen data
π Repository Structure arduino Copy code Task-5-Model-Evaluation/ β βββ heart.csv βββ Heart_Model.ipynb βββ README.md π§ Concepts Learned Importance of train-test split
Model evaluation techniques
Understanding classification metrics
Avoiding overfitting
β Conclusion The Logistic Regression model was successfully trained and evaluated on the Heart Disease dataset. The evaluation metrics indicate that the model can reliably predict heart disease presence based on patient data.