data-engineering
Big Data Project - Learning: Fatalities in the Palestinian-Israeli Conflict
2023
View SourceA hands-on big data project leveraging Hadoop, Pig, and Hive to process and analyze fatality data in the Palestinian-Israeli conflict. Data is processed using Pig, stored in HDFS, and queried with Hive for insights.
This project explores big data technologies such as Hadoop, Pig, Hive, and Docker to analyze fatalities in the Palestinian-Israeli conflict. The workflow includes data ingestion, transformation, storage in HDFS, and querying using HiveQL. The analysis generates insights into fatalities over time and by age.
Some visualization using Python
I was hoping to explore Apache SuperSet But there was no time I hope you guys use it :)
Key Features
- Data ingestion using Hadoop and Pig
- Transformation and cleaning of data
- Storage and querying with Hive
- Analysis of fatalities by year and age
- Visualization using Python
Achievements
- Implemented a full big data pipeline using Hadoop, Pig, and Hive
- Stored and queried structured data efficiently in HDFS
- Generated insights on fatalities trends over time
- Explored visualization techniques to present the results