Data Analytics Project: Analyzing Crime Data to Justify Police Funding

Verified

Added on  2023/04/25

|12
|1290
|388
Project
AI Summary
This project analyzes crime incidents to determine whether a police department needs funding. It examines crime incidents by sector, time, and type, using linear regression to analyze the association between crime incidents and the number of officers at the scene. The data was prepared by removing errors, duplicates, and unnecessary columns, followed by imputation of missing values. The analysis reveals that most crimes occurred on March 27th, with disturbances, traffic-related calls, and suspicious circumstances being the most reported crime groups. Regression analysis, conducted with and without outliers, demonstrates a statistically significant relationship between the number of incidents and the number of officers required. Residual analysis further validates the linear model's fit. The study concludes that the police department is understaffed relative to the number of incidents and justifies the need for increased funding to meet the required officer-to-incident ratio, emphasizing data privacy and security measures throughout the process. Desklib provides access to this and other solved assignments for students.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
RUNNING HEADER: DATA ANALYTICS 1
Data Analytics
Student’s name:
Student’s ID:
Institution:
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Analytics 2
Summary
The following study aims at determining whether the police department needs to be funded. The
basis of determining this was by analyzing the crime incidents with respect to sectors, time and
type. The association between the incidents of crime and the number of officers at the crime
scene was analyzed using linear regression analysis.
Data Preparation
The first step was removing potential errors and outliers. An example of an error is in the
District/Sector column which had blank fields. A value for the field was provided by imputing
missing values from another nearby column in that row. Imputation was used since it the
problem of bias will be small relative to the benefits that would be derived. Thus, it was helpful
in reducing omitted bias of variables. On the other hand, a duplicate was not present as seen in
the column “CAD CDW ID”. However, the longitude and the latitude columns were duplicates
since the incident location contains both longitude and latitude. Hence, they were removed.
Data Analysis
A number of columns were also removed since they were not necessary for the purpose of this
project. They include Event Clearance Code, CAD CDW ID, General Offense Number, CAD
Event Number, Event Clearance SubGroup, Initial Type Description, Hundred Block Location,
Initial Type Subgroup, Census Tract, At Scene Time, Initial Type Group, and Incident Locations.
Once the values were imputed, the errors, duplicate rows and the unnecessary columns removed,
the remaining dataset was presented under the worksheet Clean Data and used for analysis.
Document Page
Data Analytics 3
Table 1: Events by date
Total
244
583
219
Events by Date
26-Mar 27-Mar 28-Mar
Figure 1: Events by date
From table and figure 1 above, it can be seen that most of the crimes occurred on the 27th of
March (583 crimes) while the 26th and 28th of March had the least with 244 and 219 crimes
respectively.
To know the type of group of crimes which occur the most is as shown below.
Document Page
Data Analytics 4
Table 2: Events by type
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Analytics 5
Figure 2: Events by type
The highest crime group that were reported during the three days were disturbances (167
crimes), traffic-related calls (165 crimes) and suspicious circumstances (150 crimes). The least
reported were harbor calls and weapons call with a count of 1 each.
Document Page
Data Analytics 6
Table 3: Events by sector
Figure 3: Event by sector
Document Page
Data Analytics 7
It can be seen that most of the crimes were reported in sector H (125) and sector M (91). The
least was sector O with 31 crimes reported only.
Regression Analysis
Figure 4: Scatterplot with outliers
Table 4: Regression analysis with outliers
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Analytics 8
With the presence of outliers, it can be seen that 87% of the changes in the model are accounted
for factors in the model while 13% is accounted for by factors not in the model. Consequently,
the regression model is statistically significant at p<0.05. The constant was observed to be
insignificant since p>0.05. However, the number of incidents were statistically significant.
Hence an increase in the number of incidents by a unit leads to a 1.491 unit increase in the
number of officers at the scene.
Figure 5: Scatter plots without outliers
Table 5: Regression analysis without outliers
Document Page
Data Analytics 9
Outliers are influential when they have a big effect on regression (Alma, 2011). Hence the need
to remove the outliers and observe the output. When the outliers were removed, it is evident that
96% of the changes in the model are accounted for factors in the model while 4% is accounted
for by factors not in the model. Hence the adjusted r square increased from 0.87 to 0.96.
Consequently, the regression model remained statistically significant at p<0.05. The constant
was also observed to be insignificant since p>0.05. However, the number of incidents were
statistically significant. Hence an increase in the number of incidents by a unit leads to a 1.83
unit increase in the number of officers at the scene.
Residual Analysis
The analysis of the regression are as shown below:
Figure 6: Residuals with outliers
In figure 6 it can be seen that the pattern is non-random. Thus, it is a better fit for a better non-
linear model (Draper & Smith, 2014).
Without the outliers, the residual figure is as shown.
Document Page
Data Analytics 10
Figure 7: Residuals without outliers
When the outliers were removed, it was seen that the pattern was more random. Thus, it indicates
that a linear model provides a decent fit to the data (Draper & Smith, 2014).
Data Privacy and Security
The data was stored and accessed following best IT security practices to ensure integrity,
accessibility, and integrity. The precautions taken by the Seattle Police department when
working with the data was avoiding and generalizing confidential data to ensure that the risk of
invasion of privacy and breach of confidentiality was avoided. For instance, data on exact
location which could breach the privacy of victims or the suspects were avoided. Other data
which could also breach privacy was the general offense number and the event clearance code
which were avoided and not included in the written report. Hence, the data was processed in a
lawful, fair and transparent manner in the highest quality ensuring that the study is relevant,
adequate and non-excessive to the purpose.
Conclusion
The linear regression was successful in establishing that there is a significant relationship
between incident of crime and the number of police. Evidently, the number of police officers
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Analytics 11
depend on the number of incidents of crime. Thus, the need of funding for the police department
depends on the number of incidents of crime. The average number of police officers at a scene of
an incident was observed to be 1.89. However, from the regression analysis, 1 incident requires
the presence of 9.14 officers. Thus, it is evident that the department is working below the
required ration of police officers at 1 scene. Hence, the department needs to be funded to
increase the number of officers. It was established that the department needed a funding of 7.25
officers (the difference between the needed average officers and the current average number of
officers). Since, the model explains 96 percent of the variability, then it is safe to say that the
department is justifiable to obtain the funding.
Document Page
Data Analytics 12
Reference
Alma, Ö. G. (2011). Comparison of robust regression methods in linear regression. Int. J.
Contemp. Math. Sciences, 6(9), 409-421.
Draper, N. R., & Smith, H. (2014). Applied regression analysis (Vol. 326). John Wiley & Sons.
chevron_up_icon
1 out of 12
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]