Road Safety Analysis

#Road-Safety-Analysis

In this notebook I will present the process a Data Scientist / Analyst should follow in order to extract useful information from a dataset. As an example I will use the given Acc.csv file for Accidents in the United Kingdom for 2017. The analysis is split in the mandatory steps for creating meaningful insights.

Importing the data

#Importing-the-data

Initial Exploration of the Dataset

#Initial-Exploration-of-the-Dataset
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

In the following steps I will not use these columns, therefore there is no need to handle these Null values.

Loading output library...
Loading output library...

Above the basic statistics of our Dataframe are presented but most of them are meaningless as the specific attributes are recorded from Python in a fault data type. For example, the attribute Road_Type should be category and not integer as is obvious after the describe() function.

Renaming columns of the Dataframe

#Renaming-columns-of-the-Dataframe
Loading output library...
Loading output library...

Subsetting the Dataframe

#Subsetting-the-Dataframe

I will select only the columns that I will use for the analysis.

Loading output library...
Loading output library...
Loading output library...

Changing the name of the needed elements

#Changing-the-name-of-the-needed-elements

In order to understand what the specific elements represent in each column of the dataset, I referred to the given metadata excel file and I changed these values.

Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...
Loading output library...

I will convert some attributes from object to category.

Basic Operations with the dataset

#Basic-Operations-with-the-dataset
Loading output library...

One of the important steps in order to have a general picture of the dataset is to extract a basic statistical information from the numeric attributes. The mean,standard deviation, min, max and the Quartiles are shown in the above table, for the two numeric attributes NumberOfVehicles and SpeedLimit.

Filtering the Dataset

#Filtering-the-Dataset

Now, the dataset is cleaned and ready for the analysis. I will implement the analysis by answering some queries on the dataset in order to gain insight from the results.

****

Find the percentage of all the accidents that are Fatal and occur on Saturday

So, 0.216% of the Accidents that occur on Saturday are Fatal.

****

Find the number of accidents that happened in Greater Manchester and occured when it was snowing

Loading output library...
Loading output library...
Loading output library...

So, 25 accidents happened in Greater Manchester when it was snowing.

Find the number and the percentage of accidents that occured at urban area with speed greater than 30

#Find-the-number-and-the-percentage-of-accidents-that-occured-at-urban-area-with-speed-greater-than-30
Loading output library...
Loading output library...
Loading output library...

So, 10% of the accidents that happened in urban area were due to the fact that the driver had been exceeding the speed limit of 30 miles per hour in these areas.

Analysis using Visualizations

#Analysis-using-Visualizations
Loading output library...

Distribution of the SpeedLimit attribute

#Distribution-of-the-SpeedLimit-attribute
Loading output library...

From this graph it is obvious that most of the accidents happened in the speed limit of 30 miles per hour. Also there is a significant number of accidents with speed greater than 60 miles per hour.

Notice: The shape of distribution is as such because the SpeedLimit attribute should be categorical and not numeric, as shown from this plot. But, I handled it like numeric for the extraction of other statistical information from the dataset.

Boxplot of the AccidentSeverity and SpeedLimit of the accidents

#Boxplot-of-the-AccidentSeverity-and-SpeedLimit-of-the-accidents
Loading output library...

From this plot it is obvious that the Fatal accidents have big interquartile range and therefore an accident can be fatal at any speed. Moreover, the slight accidents occur in low speed with some outliers.

Stacked Histograms for the severity of accidents and the speed limit

#Stacked-Histograms-for-the-severity-of-accidents-and-the-speed-limit
Loading output library...

We have the same results as the previous plot.

Plot of the NumberOfVehicles and the SpeedLimit on specific DayOfWeek

#Plot-of-the-NumberOfVehicles-and-the-SpeedLimit-on-specific-DayOfWeek
Loading output library...

It is clear that the accidents with the most vehicles included happened in the speed limit of 40 and 50 miles per hour on Sundays, which makes sense as on that day most of the people return from weekend trips.

Barplot of the day when accidents happened

#Barplot-of-the-day-when-accidents-happened
Loading output library...

It is obvious that the most accidents occured on Fridays and were labeled as Slight.

Barplot of the frequency of the Accident Severity of accidents

#Barplot-of-the-frequency-of-the-Accident-Severity-of-accidents
Loading output library...

So, most of the accidents are Slight and happened on Dry surface.