This project involves the analysis of a Black Friday sales dataset using Python and various data analysis libraries, such as Pandas and Seaborn.
The goal is to gain insights into customer behavior, purchasing patterns, and the impact of different demographic factors on sales.Some example questions include:
Were purchasing patterns dependant on gender and age groups?
Should consumer's occupation be considered impactful on sales?
-- Loading & Cleaning the Dataset
-- Performing basic analysis on columns
-- Analysis (Based on gender, Age, Marital Status)
-- Multicolumn Analysis
-- Analysis (Based on occupation & Gender)
-- Combining Gender and Marital Status
- User_ID
- Product_ID
- Gender
- Age
- Occupation
- City_Category
- Stay_In_Current_City_Years
- Marital_Status
- Product_Category_1
- Product_Category_2
- Product_Category_3
- Purchase
The dataset is loaded using Pandas, and basic information about its structure is displayed using df.info() and df.head(). Null values in the dataset are identified and handled, with columns containing excessive null values dropped to maintain data integrity.
Basic statistics and visualizations are performed on various columns, such as the count of unique values, summary statistics, and total purchase amount.
Analysis is conducted on the 'Gender' column, including the count of each gender, purchasing patterns, and visualizations representing the gender distribution.
The 'Age' and 'Marital_Status' columns are analyzed individually, including visualizations depicting the age distribution, amount spent by age group, and the marital status distribution.
Relationships between multiple columns are explored through visualizations, such as the interaction between 'Gender' and 'Age', 'Gender' and 'Marital_Status', 'City_Category' and 'Age', and more.
Analysis is performed on the 'Occupation' column, including count, purchase amount, and mean purchase amount for each occupation. The 'Product_Category_1' column is analyzed, showing the count, purchase amount, and mean purchase amount for each category. Top products based on purchase amount and mean purchase amount are visualized.
A combined analysis is performed to understand the relationship between 'Gender' and 'Marital_Status', including visualizations depicting the distribution.
The detailed exploration of various columns and their interactions contributes to a better understanding of the factors influencing sales during Black Friday.