16 years of UK Accidents: The Good, The Bad, and the preventable — Part 2
A comprehensive study on Vehicle and Casualties data (2005–2021)
Accidents on the road can have devastating consequences, affecting not only the individuals involved but also their families and society at large. The UK, like many developed countries, has made significant strides in road safety over the years, but accidents still occur. From the part 1, we have examined accidents that took place in UK roads from 2005 to 2021, highlighting the factors that led to the incident and the measures that have been put in place to prevent similar accidents in the future. Now, by analyzing another two important datasets related to this case study, I hope to shed light on the challenges and complexities of road safety in the UK and the importance of taking a proactive approach to reduce the risk of accidents.

To understand the flow of sequence, here are the contents:
1.Introduction
2.Understanding the data
3.Applying Feature Engineering
4.Exploratory Data Analysis
5.Applying Machine Learning Algorithms
6.Final Result
7.Conclusion
Here we go………🚀
I. Introduction
The UK accident case study provides a comprehensive analysis of three crucial aspects of traffic accidents and road safety in the UK: accidents, vehicles, and casualties. In the part 1, I have focused on accident data, now I will focus on Vehicle and Casualties data together from 2005 to 2021, which will provide a robust and comprehensive view of the trends, patterns, and factors contributing to road accidents in the UK. Through this analysis, we hope to identify key areas for improvement and develop effective strategies to reduce the incidence of road accidents and enhance road safety in the UK.
II. Understanding the data
For this case study, I have collected the data from here under Open Government Licence. Let’s explore our new data.

This is how our combined data looks like. Firstly, I have, ensured that our dataset is clean and free of missing values. Then, I performed feature engineering to reduce data complexity and identify the most relevant features. This approach will enhance the performance of our machine learning models and enable more accurate predictions.
III. Applying Feature Engineering : Extra Tree Classifier
Using an Extra Trees classifier for feature selection is important because it can help to identify the most important features in a dataset while reducing the risk of overfitting. It is an ensemble method that combines multiple decision trees and selects features based on their importance scores. By removing less important features, we can reduce the complexity of the data and improve the performance of our machine learning models, while also enhancing the interpretability of the model.

Above are the best selected features from vehicle and casualties data. Now, let’s explore these features one by one and try to know them.
IV. Exploratory Data Analysis
1. age_of_casualty
It refers to the age of the individual involved in a road accident. This variable provides information on the age distribution of casualties in road accidents and can be used to identify age-specific risk factors and patterns of injury.

Here, we can see that, the majority of casualties in road accidents fall within the age range of 16 to 35 years. This age group is particularly vulnerable to road accidents, with the highest incidence of casualties observed within this range.
2. driver_imd_decile
It refers to the Index of Multiple Deprivation (IMD) decile of the driver involved in a road accident. The IMD is a measure of relative deprivation in small areas across England, based on factors such as income, employment, education, health, and crime. The decile represents the level of deprivation in which the driver falls within a range of 1 to 10, with 1 being the least deprived and 10 being the most deprived. This variable is useful in understanding the relationship between deprivation and road accidents and can help to inform policies and interventions aimed at reducing road accidents in deprived areas. In our case, most data about this feature is missing but we can see similar ranges in least to the most deprived decile.

3. accident_year

when we went through accidents par year in part1 , we saw a downfall from year 2005 to 2021. Here, we can see the most casualties happened in 2016 comparing to other years.
4. vehicle_manoeuvre
It describes the type of manoeuvre that a vehicle was undertaking at the time of an accident. This variable provides information on the actions of the driver and the circumstances surrounding the accident. Examples of manoeuvres include changing lanes, turning left or right, overtaking, and reversing. Understanding the types of manoeuvres involved in road accidents can help to identify high-risk behaviours and inform the development of targeted interventions and policies to reduce the incidence of accidents associated with these manoeuvres. We can see the most observed action of the driver is going ahead than other.

5. junction_location
It refer to the location of a road junction where most of the accidents occurred. In the UK, a road junction is a location where two or more roads meet or intersect, and they can take many different forms, including roundabouts, T-junctions, crossroads, and more. Here, most accidents happened within 20 meters of junction.

If you’ve been enjoying this blog and finding it helpful, help fuel my curiosity and expand my horizons by buying me a book! 📚✨
6. first_point_of_impact
When two or more vehicles are involved in an accident, the point of impact can be an important factor in determining how the accident occurred, who was at fault, and the severity of the injuries sustained. The first point of impact refers to the specific location on the vehicle where the collision occurred, such as the front bumper, rear bumper, side door, or roof, here we can observed that the vehicle was moving forward at the time of the collision and that the impact occurred at the front of the vehicle.

7. age_band_of_driver
It refers to the age range of the driver involved in a collision. The age band of the drivers in most accidents is 26–35. This age band is still relatively young, and drivers in this age group may still be prone to accidents due to other factors, such as driver fatigue, speeding, or alcohol or drug use.

8. vehicle_type
It refers to the type of vehicle involved in a collision. By identifying the vehicle type involved in an accident, investigators can better understand the factors that may have contributed to the collision and develop strategies to reduce the risk of similar accidents in the future. Here car is found to be involved in a disproportionate number of accidents.

9. casualty_type
It refers to the type of person who was injured or killed in a collision. Here, we can see the car occupant means can be a driver or passenger in the car is most found to be involved in serious casualty as we have seen the car is the vehicle type most of the accidents happened with.

10. age_band_of_casualty
It refers to the age range of the injured person involved in a collision. It is unfortunate to see that accidents affected individuals from all age groups and no age band is exempted from their impact. But 26–35 is the age band affected most as this band reflects to the peoples who go out everyday for work, etc.

As we gone through and understood our best features which affected the most. Now we are ready to apply machine learning algorithms to see insights of them.
V. Applying Machine Learning Algorithms
I have already mentions why I have selected these algorithms for UK accidents case study. So we will directly move towards Our results.
- Random Forest Algorithm
The results I have found using Random Forest:



Random Forest model correctly classified 86% of the instances in both the training and test sets.
2. XGBoost
The results I have found are:



XGboost gave me the nearly same results as Random Forest just like the part 1.
3. K-Nearest Neighbors
The results are:



Unlike the part1 , KNN performed very well on this vehicle and casualties data and it is able to distinguish different classes.
4. AdaBoost Classifier
The results I have found from adaBoost is :



Adaboost results are also same as XgBoost and Random Forest. It can also can differentiate the classes well.
5. Bagging Classifier
The results are :



It seems like the bagging classifier is overfitting the training data as the training accuracy is very high (95%) compared to the test accuracy (84%).
6. Final Result

7. Conclusion
Woah!!!!🥹I am into tears (of joy off course). This case study is such a wealth of valuable information to learn from, not only in Machine Learning but how to be an alert citizen also. I have learned insights on the features that contributed to road accidents and casualties, these 20 features are:
1. time
2. accident_year
3. local_authority_district
4. month
5. local_authority_highway
6. first_road_number
7. day_of_week
8. police_force
9. number_of_vehicles
10. number_of_casualties
11. age_of_casualty
12. driver_imd_decile
13. year (casulties)
14. vehicle_manoeuvre
15. junction_location
16. first_point_of_impact
17. age_band_of_driver
18. vehicle_type
19. casualty_type
20.age_band_of_casualty
By working together, governments, organizations, and individuals can make a significant contribution to improving road safety. This not only helps prevent accidents and injuries, but also enhances the overall transportation experience for everyone, making it more efficient, comfortable, and enjoyable.
You can find the code in python on Github.
You can reach me on LinkedIn.
Stay tuned!