data-engineering

Supermarket Sales - Data Warehouse Project

This project involves building a data warehouse using PostgreSQL and Hive, transforming data through ETL processes, and visualizing insights through dashboards.

In this project, we build a data pipeline from data ingestion to dashboarding and deploying an ML model.

pipeline

The 3NF model in the Postgresql database

3fn

The star schema for the HIVE data warehouse

DWR model

The data source

Supermarket sales: https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales

Key Features

  • ETL Pipeline for data extraction, transformation, and loading
  • 3NF database modeling in PostgreSQL
  • Star Schema design for Hive data warehouse
  • Data visualization through dashboards
  • Machine Learning model deployment

Achievements

  • Designed a scalable data warehouse architecture
  • Implemented an automated ETL pipeline
  • Optimized SQL queries for performance improvements
  • Successfully deployed a predictive ML model