Titanic Survival Prediction
A Titanic survival prediction model using machine learning techniques to classify passengers based on survival probability.
Titanic Survival Prediction
Overview
This project is a Kaggle notebook focused on the Titanic survival prediction challenge. The goal is to apply machine learning techniques to classify passengers based on their likelihood of survival.
Dataset
The dataset consists of:
- PassengerId: Unique identifier for each passenger
- Pclass: Ticket class (1st, 2nd, 3rd)
- Name, Sex, Age: Personal details
- SibSp, Parch: Number of siblings/spouses and parents/children aboard
- Ticket, Fare: Ticket number and fare paid
- Cabin: Cabin number (often missing)
- Embarked: Port of embarkation
- Survived: Target variable (1 = survived, 0 = not survived)
Exploratory Data Analysis (EDA)
The project begins with data visualization and preprocessing:
- Checked missing values and handled them appropriately (e.g., imputing median age)
- Analyzed survival rates by gender, class, and fare price using Seaborn and Matplotlib
- Feature correlations explored using heatmaps
Feature Engineering
To improve model performance, various feature engineering steps were applied:
- Created new features (e.g., FamilySize, Title extraction from names)
- Converted categorical variables into numerical representations
- Scaled numerical features for better model performance
Model Training & Evaluation
Multiple machine learning models were trained and compared:
- Logistic Regression – A baseline model for classification
- Random Forest Classifier – A more powerful ensemble method
- XGBoost Classifier – Gradient boosting model for better accuracy
Model performance was evaluated using:
- Accuracy score
- Confusion matrix
- Cross-validation scores
Results & Findings
- Random Forest and XGBoost outperformed Logistic Regression, indicating that tree-based methods capture complex patterns better.
- Gender was the most important feature, with females having a significantly higher survival rate.
- First-class passengers had a much higher chance of survival than lower classes.
Inshights
This project demonstrates end-to-end machine learning workflow on tabular data, including:
- Data preprocessing and feature engineering
- Model training, evaluation, and comparison
- Insights gained from data exploration
Future improvements could involve hyperparameter tuning, ensemble stacking, or deep learning approaches.
Key Features
- Performed extensive Exploratory Data Analysis (EDA) on Titanic dataset
- Engineered features to improve model performance
- Trained multiple classification models (Logistic Regression, Random Forest, XGBoost)
- Compared model performance using accuracy and confusion matrix
- Optimized hyperparameters for best-performing models
Achievements
- Successfully preprocessed and visualized Titanic dataset
- Achieved competitive accuracy on Kaggle Titanic competition
- Showcased different ML techniques for structured tabular data