data-engineering
Supermarket Sales - Data Warehouse Project
2023
View SourceThis project involves building a data warehouse using PostgreSQL and Hive, transforming data through ETL processes, and visualizing insights through dashboards.
In this project, we build a data pipeline from data ingestion to dashboarding and deploying an ML model.
The 3NF model in the Postgresql database
The star schema for the HIVE data warehouse
The data source
Supermarket sales: https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales
Key Features
- ETL Pipeline for data extraction, transformation, and loading
- 3NF database modeling in PostgreSQL
- Star Schema design for Hive data warehouse
- Data visualization through dashboards
- Machine Learning model deployment
Achievements
- Designed a scalable data warehouse architecture
- Implemented an automated ETL pipeline
- Optimized SQL queries for performance improvements
- Successfully deployed a predictive ML model