Analysis of NYPD Motor Collision in New York City (2012–2022)

Aditya Agarwal
4 min readJun 14, 2022

--

On any given day, the NYPD receives between 200 and 400 reports of motor vehicle collisions in New York City. The NYC, Open Data project has made this collision data publicly available. This article will formulate some questions and analyze the NYPD motor Collision dataset.

All the project files used for our analysis are in my Github repository.

Overview

We used Google Cloud Platform and Tableau for this project to conduct our analysis and make recommendations to reduce the number of accidents in Brooklyn, a borough in New York. The whole dataset, which spans 2012 to the present, contains over 1.7 million records and may be accessed here.

Methodology

* We accessed the data from “nycopendata.com” using `Open Data API` (OData API) and performed Data Connection with Tableau to conduct our Analysis.

* Then, we cleaned the data using Python and stored it in “Google Cloud Storage” as a Bucket to create a virtual instance.

* We performed our Analysis using “Google’s Big Query” in Google Cloud Platform and stored the query results in CSV files.

* After our Analysis, we have generated a report using Google Slides to share our Insights and give Recommendations.

Approach for Analysis (Image by Author)

Approach for Analysis

Here is a detailed explanation of each step of our analysis.

1. Understanding the Data

A clear understanding of your data is essential for any case study to guarantee that your analysis does not deviate from the original objective. Before going the additional mile, following the guidelines is critical. In this case study, I almost missed the part in the brief where I was required to analyze only data from 2012 to 2022.

2. Preparing the Data

We have used Google big Query and Python to prepare our data for analysis. To do this, we need to clean the data thoroughly by checking for “Null values,” removing “Duplicate values/Irrelevant values,” removing “Outliers,” handle “Missing values,” and lastly, validate our data.

The code for “data cleaning” can be found in my Github repository.

3. Performing analysis

We have used “Google Big Query” and “Tableau” to analyze the dataset.

  • To check the highest collision rates were caused due to which factor:
  • To check the Type of collision concerning Borough:
  • Collision Analysis: This analysis identifies the most common causes of crashes and fatalities. The majority of deaths were caused by driver inattention/distraction, as shown below.
Detecting Collision Prone Areas (Image by Author)
  • Time Series Analysis: Determine what time of day or week has the highest collisions. The graphic below shows that most crashes happened during rush hour (4–5 pm). We also see many people in the early hours of the day.
Time Series Analysis (Image by Author)

4. Getting Insights

* Between 4 pm to 5 pm was the peak time of the day when the maximum number of people got injured.

* The number of people getting injured was rising from 2012 and was at its peak in 2018 with a value of 123,859 injuries.

* In 2018, the total number of injured people decreased to 29,604 injuries in 2022.

* A lack of Driver’s attention majorly caused the highest number of deaths and injuries. The other factors also point toward the Driver’s lack of driving skills.

* Most accidents were caused by Sports utility/Station wagon vehicles, followed by Sedan and Passenger vehicles.

* Also, 4 — wheeled vehicles were more prone to accidents than 2 — wheeled vehicles.

5. Giving Recommendations

* Increase the number of Traffic Officers between 4 pm and 6 pm on days with the highest accident rates.

* Raise the availability of ambulances between 1 pm to 5 pm in collision-prone areas.

* Provide a more robust and efficient Public transit system to encourage usage by commuters.

* Focus on high collision-prone areas such as 11236, 11207, and 11234 in prioritizing new projects like traffic lights or street signs.

* Increase the frequency of driver re-training and more strict fines for repeat offenders.

* Increase the awareness about the use of public transport the commuters instead of walking or using personal vehicles to reduce accidents.

* Among all the boroughs, BROOKLYN and QUEENS had the highest number of deaths in New York City.

If you found this interesting or helpful, please help others find it by sharing and clapping.

You can connect with me on Linkedin — Aditya Agarwal

Relevant Links

References

I used many courses and videos to learn how to analyze data using Google Cloud Platform and connect the dataset using `Open Data API` with Tableau. Some of them are listed below:

--

--