Applying Data Mining & Web Analytics to Crime Data in Chicago
VerifiedAdded on 2023/04/21
|14
|2163
|213
Report
AI Summary
This report addresses the problem of crime incidents affecting public safety in Chicago by identifying delinquency zones and timelines. Data mining techniques like K-Means, C&R Tree, and Neural Networks, along with text mining, are applied to crime incident data using IBM Modeler. The analy...

University
Semester
DATA MINING AND WEB ANALYTICS
Student ID
Student Name
Submission Date
Semester
DATA MINING AND WEB ANALYTICS
Student ID
Student Name
Submission Date
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
1. Problem Identification...................................................................................................................1
2. Analysis.........................................................................................................................................1
2.1 Data Mining...........................................................................................................................1
2.1.1 K-Means........................................................................................................................1
2.1.2 C & R Tree....................................................................................................................2
2.1.3 Neural Network..............................................................................................................2
2.2 Text Mining...........................................................................................................................3
3. Recommendation...........................................................................................................................4
4. Conclusion.....................................................................................................................................4
References.............................................................................................................................................5
Appendix...............................................................................................................................................6
1. Problem Identification...................................................................................................................1
2. Analysis.........................................................................................................................................1
2.1 Data Mining...........................................................................................................................1
2.1.1 K-Means........................................................................................................................1
2.1.2 C & R Tree....................................................................................................................2
2.1.3 Neural Network..............................................................................................................2
2.2 Text Mining...........................................................................................................................3
3. Recommendation...........................................................................................................................4
4. Conclusion.....................................................................................................................................4
References.............................................................................................................................................5
Appendix...............................................................................................................................................6

1. Problem Identification
The identified problem is the Crime incidents are potentially affecting the public safety
issues and to identify the zones and timelines of delinquency in Chicago for a between risk
management. The crimes are about physical attack or fight with a weapon, sexual battery,
robbery or shoplifting and vandalism. It was also assumed that requiring a benchmark of law
enforcement contact could decrease the subjective judgment connected with the incidents.
The crimes taking place at the community and domestic area include those that took place in
the community and domestic area buildings, on grounds, on buses, and at community, school
and domestic area-sponsored events or activities. So, we have developed a model to resolve
the identified problem by using data mining and text mining algorithms. So, this assignment
to solve this specific problem which is determining the zones and times to improve the
management of the crimes risk.
2. Analysis
2.1 Data Mining
In data mining techniques, this technique uses the structured data. In our analysis uses
the crime incidents data. This data contains the following filed such as ID, Date, Primary
Type, Block, Case Number, Year, Description, IUCR, Location Description, Domestic, Beat,
Arrest, District, FBI Code, Community Area, Ward, X Coordinate, Y Coordinate, Latitude,
Longitude, Location and Updated On.
Data mining technique has various classifications and clustering techniques but, in our
analysis we have chosen the clustering method as K – Means and classification as C & R tree.
And, Neural Network method also is used for better prediction (Blunch and Blunch, 2013).
2.1.1 K-Means
The analysis of K-Means is used to classify the crime incidents based on the zones
and timelines of delinquency in Chicago so resolve the identified problem. It is used to
recognize the data patterns in the crime data without any need of exacting the match to any of
the stored patterns. KNN analysis is utilized for computing the values of a continuous target.
In such a circumstance, the nearest neighbours’ average or median target value is utilized for
obtaining the predicted value for a new case.
In our analysis, we are using the following field’s arrest, domestic, district and year.
Based on mentioned fields are used to predict the crime arrest in domestic and district area in
1
The identified problem is the Crime incidents are potentially affecting the public safety
issues and to identify the zones and timelines of delinquency in Chicago for a between risk
management. The crimes are about physical attack or fight with a weapon, sexual battery,
robbery or shoplifting and vandalism. It was also assumed that requiring a benchmark of law
enforcement contact could decrease the subjective judgment connected with the incidents.
The crimes taking place at the community and domestic area include those that took place in
the community and domestic area buildings, on grounds, on buses, and at community, school
and domestic area-sponsored events or activities. So, we have developed a model to resolve
the identified problem by using data mining and text mining algorithms. So, this assignment
to solve this specific problem which is determining the zones and times to improve the
management of the crimes risk.
2. Analysis
2.1 Data Mining
In data mining techniques, this technique uses the structured data. In our analysis uses
the crime incidents data. This data contains the following filed such as ID, Date, Primary
Type, Block, Case Number, Year, Description, IUCR, Location Description, Domestic, Beat,
Arrest, District, FBI Code, Community Area, Ward, X Coordinate, Y Coordinate, Latitude,
Longitude, Location and Updated On.
Data mining technique has various classifications and clustering techniques but, in our
analysis we have chosen the clustering method as K – Means and classification as C & R tree.
And, Neural Network method also is used for better prediction (Blunch and Blunch, 2013).
2.1.1 K-Means
The analysis of K-Means is used to classify the crime incidents based on the zones
and timelines of delinquency in Chicago so resolve the identified problem. It is used to
recognize the data patterns in the crime data without any need of exacting the match to any of
the stored patterns. KNN analysis is utilized for computing the values of a continuous target.
In such a circumstance, the nearest neighbours’ average or median target value is utilized for
obtaining the predicted value for a new case.
In our analysis, we are using the following field’s arrest, domestic, district and year.
Based on mentioned fields are used to predict the crime arrest in domestic and district area in
1
You're viewing a preview
Unlock full access by subscribing today!

year wise. And, also we have predicted the values based on the crime incident based on the
zones and timelines of delinquency in Chicago. It is used to provide the better risk
management for Chicago crime incidents. The K – Means technique output screenshots are
illustrated in the Appendix.
2.1.2 C & R Tree
The analysis of C&R tree is used to utilize the crime incidents based on the zones and
timelines of delinquency in Chicago and it is used to resolve the identified problem. C&R
stands for Classification and Regression. It is a Tree node which contains a tree-based
classification and similar prediction method (Garson., 2012). It utilizes recursive partitioning,
which ensures splitting of the training records into segments that contain exactly same values
in the output fields. C&R Tree node begins with the examination of input fields, which helps
in finding the best split that is measured by the reduction in the impurity index as it is the
result of the split.
In our analysis, we are using the following field’s arrest, domestic, district and year.
Based on mentioned fields are used to predict the crime arrest in domestic and district area in
year wise. And, also we have predicted the values based on the based on the zones and
timelines of delinquency in Chicago occurred in a year, domestic, and community area and
primary type. The Classification and Regression (C&R) Tree technique output screenshots
are illustrated in the Appendix.
2.1.3 Neural Network
A Neural network method is utilized to perform and resolve the crime incidents based
on the zones and timelines of delinquency in Chicago by use the various operations like-
Classification, feature mining, clustering, pattern recognition and prediction. It is used to
model complex relationships between inputs and outputs or to find patterns in data. It is used
for storing, recognizing and in retrieving the patterns or database entries (Stahlbock, Abou-
Nasr and Weiss, 2018). Moreover, it is utilized for solving the problem of combinatorial
optimization, for filtering the noise from the measurement data and for controlling the ill-
defined problems. Additionally, the neural network method is used for estimating the
sampled functions, when the form of the functions is not known. Basically, pattern
recognition and function estimation are a couple of abilities which are used to make artificial
neural networks (ANN) a common utility in data mining (Perner, 2015).
In our analysis, we are using the following field’s arrest, domestic, community area
and primary type. Based on mentioned fields are used to predict the crime arrest in domestic
2
zones and timelines of delinquency in Chicago. It is used to provide the better risk
management for Chicago crime incidents. The K – Means technique output screenshots are
illustrated in the Appendix.
2.1.2 C & R Tree
The analysis of C&R tree is used to utilize the crime incidents based on the zones and
timelines of delinquency in Chicago and it is used to resolve the identified problem. C&R
stands for Classification and Regression. It is a Tree node which contains a tree-based
classification and similar prediction method (Garson., 2012). It utilizes recursive partitioning,
which ensures splitting of the training records into segments that contain exactly same values
in the output fields. C&R Tree node begins with the examination of input fields, which helps
in finding the best split that is measured by the reduction in the impurity index as it is the
result of the split.
In our analysis, we are using the following field’s arrest, domestic, district and year.
Based on mentioned fields are used to predict the crime arrest in domestic and district area in
year wise. And, also we have predicted the values based on the based on the zones and
timelines of delinquency in Chicago occurred in a year, domestic, and community area and
primary type. The Classification and Regression (C&R) Tree technique output screenshots
are illustrated in the Appendix.
2.1.3 Neural Network
A Neural network method is utilized to perform and resolve the crime incidents based
on the zones and timelines of delinquency in Chicago by use the various operations like-
Classification, feature mining, clustering, pattern recognition and prediction. It is used to
model complex relationships between inputs and outputs or to find patterns in data. It is used
for storing, recognizing and in retrieving the patterns or database entries (Stahlbock, Abou-
Nasr and Weiss, 2018). Moreover, it is utilized for solving the problem of combinatorial
optimization, for filtering the noise from the measurement data and for controlling the ill-
defined problems. Additionally, the neural network method is used for estimating the
sampled functions, when the form of the functions is not known. Basically, pattern
recognition and function estimation are a couple of abilities which are used to make artificial
neural networks (ANN) a common utility in data mining (Perner, 2015).
In our analysis, we are using the following field’s arrest, domestic, community area
and primary type. Based on mentioned fields are used to predict the crime arrest in domestic
2
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

and community area in primary type wise that is School, building, public and more. And,
also we have predicted the values based on values based on the based on the zones and
timelines of delinquency in Chicago by using the primary type, community area, arrest and
domestic areas. The Tree Neural technique output screenshots are illustrated as Appendix.
Based on develop models, we are predicted values for the identified issues
(Grotenhuis and Matthijssen, 2016). Based on our analysis, we have predicted the crime
arrest highly presented in primary type, district, community area and domestic. The neural
network is used to predict the crime arrest in the domestic area so, it potentially affecting the
public safety issues. It accurately predicted the 86.9% arrest is presented in community area
and domestic area. It is illustrated as below.
And, also we are provided the better prediction of Crime data and also identified the
zones and timelines of delinquency in Chicago for a better risk management (Han and
Kamber, 2012). Based on analysis the results from both structured and unstructured data
sources, it is used to provide the identified problem solution and it is successfully specified
the zones and timelines of delinquency in Chicago which occurred in school, public, building
and the delinquency is highly valuable in shaping our understanding of influences on
antisocial and delinquent behaviour and, collectively, their contributions to the field are
enormous. Based on analysis, we are using the three models, but the neural network model is
providing the better predictions compared to other models. And, it provides the highest
accuracy compared to other models (Mcgrath, 2018).
2.2 Text Mining
The Text Mining node uses the frequency and linguistics techniques for extracting the
key concepts from the text, and it creates different categories based on the concepts and the
other data. Text mining is utilized for exploring the contents of text data, which helps in
determining whether to develop a concept model or not (Heck, Thomas and Tabata, 2012).
3
also we have predicted the values based on values based on the based on the zones and
timelines of delinquency in Chicago by using the primary type, community area, arrest and
domestic areas. The Tree Neural technique output screenshots are illustrated as Appendix.
Based on develop models, we are predicted values for the identified issues
(Grotenhuis and Matthijssen, 2016). Based on our analysis, we have predicted the crime
arrest highly presented in primary type, district, community area and domestic. The neural
network is used to predict the crime arrest in the domestic area so, it potentially affecting the
public safety issues. It accurately predicted the 86.9% arrest is presented in community area
and domestic area. It is illustrated as below.
And, also we are provided the better prediction of Crime data and also identified the
zones and timelines of delinquency in Chicago for a better risk management (Han and
Kamber, 2012). Based on analysis the results from both structured and unstructured data
sources, it is used to provide the identified problem solution and it is successfully specified
the zones and timelines of delinquency in Chicago which occurred in school, public, building
and the delinquency is highly valuable in shaping our understanding of influences on
antisocial and delinquent behaviour and, collectively, their contributions to the field are
enormous. Based on analysis, we are using the three models, but the neural network model is
providing the better predictions compared to other models. And, it provides the highest
accuracy compared to other models (Mcgrath, 2018).
2.2 Text Mining
The Text Mining node uses the frequency and linguistics techniques for extracting the
key concepts from the text, and it creates different categories based on the concepts and the
other data. Text mining is utilized for exploring the contents of text data, which helps in
determining whether to develop a concept model or not (Heck, Thomas and Tabata, 2012).
3

In our analysis, we execute the text mining node exploratory approach by using the
build interactively mode, because it is used to extract and refine the concepts, categories, and
also perform the text link analysis and clusters. The output of text mining is illustrated in the
Appendix.
Based on our analysis, recommendation for policy makers is to, Prediction and
prevention of crime is predicated on understanding the causes of crime (Witten et al., 2017).
However, as in understanding any human behaviour, taking account of the causes of crime is
an extraordinarily complex task. The complexity lies, in part, in the many factors that may
influence the onset, course and desistance of the behaviour within individuals. Community
safety/violence has been found to be associated with antisocial and delinquent behaviour.
Moreover, suggest that middle childhood may be a particularly vulnerable period for the
negative influence of community safety factors on the onset of delinquency (Li, Ogihara and
Tzanetakis, 2012).
3. Recommendation
Based on our analysis, research over the past few decades on development of delinquent
behaviour has shown that individual, social, and community conditions as well as their
interactions influence behaviour. The panel recommends the following areas as needing
particular research attention to increase understanding of the development of delinquency,
Research on ways to increase children's and adolescents' protective factors
Research on the development of physical aggression regulation in early
childhood.
4. Conclusion
This project has successfully identified the problem in a provided real data, which is
addressed by using data mining and text mining methods. A new models are developed to
resolve the identified problem, which are beneficial in providing useful business decisions for
the real data. The developed models use the Data mining techniques such as K-Means, C & R
tree neural network and text mining techniques for resolving the identified problems. The
developed model is predicted as the highly presented arrest is occurred in domestics and
community area which is potentially affecting the public safety issues.
4
build interactively mode, because it is used to extract and refine the concepts, categories, and
also perform the text link analysis and clusters. The output of text mining is illustrated in the
Appendix.
Based on our analysis, recommendation for policy makers is to, Prediction and
prevention of crime is predicated on understanding the causes of crime (Witten et al., 2017).
However, as in understanding any human behaviour, taking account of the causes of crime is
an extraordinarily complex task. The complexity lies, in part, in the many factors that may
influence the onset, course and desistance of the behaviour within individuals. Community
safety/violence has been found to be associated with antisocial and delinquent behaviour.
Moreover, suggest that middle childhood may be a particularly vulnerable period for the
negative influence of community safety factors on the onset of delinquency (Li, Ogihara and
Tzanetakis, 2012).
3. Recommendation
Based on our analysis, research over the past few decades on development of delinquent
behaviour has shown that individual, social, and community conditions as well as their
interactions influence behaviour. The panel recommends the following areas as needing
particular research attention to increase understanding of the development of delinquency,
Research on ways to increase children's and adolescents' protective factors
Research on the development of physical aggression regulation in early
childhood.
4. Conclusion
This project has successfully identified the problem in a provided real data, which is
addressed by using data mining and text mining methods. A new models are developed to
resolve the identified problem, which are beneficial in providing useful business decisions for
the real data. The developed models use the Data mining techniques such as K-Means, C & R
tree neural network and text mining techniques for resolving the identified problems. The
developed model is predicted as the highly presented arrest is occurred in domestics and
community area which is potentially affecting the public safety issues.
4
You're viewing a preview
Unlock full access by subscribing today!

References
Blunch, N. and Blunch, N. (2013). Introduction to structural equation modeling using IBM
SPSS statistics and AMOS. Los Angeles, Calif.: SAGE.
Garson. (2012). Hierarchical Linear Modeling: Guide and Applications. Sage Publications.
Grotenhuis, M. and Matthijssen, A. (2016). Basic SPSS tutorial.
Han, J. and Kamber, M. (2012). Data mining. Haryana, India: Elsevier.
Heck, R., Thomas, S. and Tabata, L. (2012). Multilevel modeling of categorical outcomes
using IBM SPSS. New York: Routledge.
Li, T., Ogihara, M. and Tzanetakis, G. (2012). Music data mining. Boca Raton: CRC Press.
Mcgrath, R. (2018). Spss Statistics. US: Tritech Digital Media.
Perner, P. (2015). Machine Learning and Data Mining in Pattern Recognition. Cham:
Springer International Publishing.
Stahlbock, R., Abou-Nasr, M. and Weiss, G. (2018). Data Mining. Bloomfield: C.S.R.E.A.
Witten, I., Frank, E., Hall, M. and Pal, C. (2017). Data mining. Amsterdam: Morgan
Kaufmann.
5
Blunch, N. and Blunch, N. (2013). Introduction to structural equation modeling using IBM
SPSS statistics and AMOS. Los Angeles, Calif.: SAGE.
Garson. (2012). Hierarchical Linear Modeling: Guide and Applications. Sage Publications.
Grotenhuis, M. and Matthijssen, A. (2016). Basic SPSS tutorial.
Han, J. and Kamber, M. (2012). Data mining. Haryana, India: Elsevier.
Heck, R., Thomas, S. and Tabata, L. (2012). Multilevel modeling of categorical outcomes
using IBM SPSS. New York: Routledge.
Li, T., Ogihara, M. and Tzanetakis, G. (2012). Music data mining. Boca Raton: CRC Press.
Mcgrath, R. (2018). Spss Statistics. US: Tritech Digital Media.
Perner, P. (2015). Machine Learning and Data Mining in Pattern Recognition. Cham:
Springer International Publishing.
Stahlbock, R., Abou-Nasr, M. and Weiss, G. (2018). Data Mining. Bloomfield: C.S.R.E.A.
Witten, I., Frank, E., Hall, M. and Pal, C. (2017). Data mining. Amsterdam: Morgan
Kaufmann.
5
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Appendix
Data Mining
The below figure is used to display the overall develop model for crime incidents data.
K-Means
The K-Means output is illustrated as below.
The below figure is used to display the model summary.
6
Data Mining
The below figure is used to display the overall develop model for crime incidents data.
K-Means
The K-Means output is illustrated as below.
The below figure is used to display the model summary.
6

The below figure is used to display the clusters information for created model.
7
7
You're viewing a preview
Unlock full access by subscribing today!

The below figure is used to display the overall created K-Means model information.
Neural Network
The Neural network output is illustrated as below.
The below figure is used to display the model summary.
The below figure is used to display the created model importance. It is used to display
overall predictions of the model.
8
Neural Network
The Neural network output is illustrated as below.
The below figure is used to display the model summary.
The below figure is used to display the created model importance. It is used to display
overall predictions of the model.
8
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The below figure is used to display the observation of predicted values.
The below figure is used to display the overall prediction in neural network form.
9
The below figure is used to display the overall prediction in neural network form.
9

The below figure is used to display the overall created Neural network model information.
Text Mining
The below figure is used to display the overall develop model for crime incidents data in text
mining.
10
Text Mining
The below figure is used to display the overall develop model for crime incidents data in text
mining.
10
You're viewing a preview
Unlock full access by subscribing today!

The output of text analysis is illustrated as below.
11
11
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

12
1 out of 14

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.