A Titanic survival prediction model using machine learning techniques to classify passengers based on survival probability.

Titanic Survival Prediction

Overview

This project is a Kaggle notebook focused on the Titanic survival prediction challenge. The goal is to apply machine learning techniques to classify passengers based on their likelihood of survival.

Dataset

The dataset consists of:

PassengerId: Unique identifier for each passenger
Pclass: Ticket class (1st, 2nd, 3rd)
Name, Sex, Age: Personal details
SibSp, Parch: Number of siblings/spouses and parents/children aboard
Ticket, Fare: Ticket number and fare paid
Cabin: Cabin number (often missing)
Embarked: Port of embarkation
Survived: Target variable (1 = survived, 0 = not survived)

Exploratory Data Analysis (EDA)

The project begins with data visualization and preprocessing:

Checked missing values and handled them appropriately (e.g., imputing median age)
Analyzed survival rates by gender, class, and fare price using Seaborn and Matplotlib
Feature correlations explored using heatmaps

Feature Engineering

To improve model performance, various feature engineering steps were applied:

Created new features (e.g., FamilySize, Title extraction from names)
Converted categorical variables into numerical representations
Scaled numerical features for better model performance

Model Training & Evaluation

Multiple machine learning models were trained and compared:

Logistic Regression – A baseline model for classification
Random Forest Classifier – A more powerful ensemble method
XGBoost Classifier – Gradient boosting model for better accuracy

Model performance was evaluated using:

Accuracy score
Confusion matrix
Cross-validation scores

Results & Findings

Random Forest and XGBoost outperformed Logistic Regression, indicating that tree-based methods capture complex patterns better.
Gender was the most important feature, with females having a significantly higher survival rate.
First-class passengers had a much higher chance of survival than lower classes.

Inshights

This project demonstrates end-to-end machine learning workflow on tabular data, including:

Data preprocessing and feature engineering
Model training, evaluation, and comparison
Insights gained from data exploration

Future improvements could involve hyperparameter tuning, ensemble stacking, or deep learning approaches.

Key Features

Performed extensive Exploratory Data Analysis (EDA) on Titanic dataset
Engineered features to improve model performance
Trained multiple classification models (Logistic Regression, Random Forest, XGBoost)
Compared model performance using accuracy and confusion matrix
Optimized hyperparameters for best-performing models

Achievements

Successfully preprocessed and visualized Titanic dataset
Achieved competitive accuracy on Kaggle Titanic competition
Showcased different ML techniques for structured tabular data