Week 1

15/12/25

Our first week of development greatly focused on exploratory data analysis, cleaning data and laying the foundations for our web application. Our first task was to clean our car insurance premium dataset and ensure everything was ready for the EDA. During this process we found that there were many missing values for variables such as the vehicle’s fuel type as well as the number of doors the vehicle has.

Machine Learning

As the ultimate goal of our project is to make a program that can estimate car insurance premiums, we began our EDA by plotting the premium values against various other metrics to try and find correlations and give us a better idea of what machine learning model would be the best to create an insurance estimator. Since insurance premiums are calculated based on many different important factors and each case can be vastly different from another, it was difficult to find easy correlation between insurance premiums and one other factor, this was also an indicator that a random forest or gradient boosting model may be a good choice.

One discovery that we found was that the “Type_risk” variable that was an integer ranging from 1-4 split our dataset into 4 different types of vehicles that are passenger cars, motorbikes, vans and agricultural vehicles. In the interest of our project, we chose to filter the dataset to only contain passenger cars.

Our main references in getting started and getting used to working with pandas dataframes was the pandas documentation https://pandas.pydata.org/docs/

And our mentor helping us explore the fundamentals of data analysis and machine learning.