Data Mining Assignment: Cleaning, Assessing Messy Data Examples

Verified

Added on  2023/06/10

|3
|555
|242
Discussion Board Post
AI Summary
This discussion post addresses the topic of messy data within the context of data mining. The assignment requires the identification of real-world examples of messy data, such as incomplete or unstructured datasets, and proposes practical steps for cleaning and organizing this data. The post explores data quality assessment, focusing on accuracy, completeness, and consistency, and how these factors can vary based on the intended use of the data, providing relevant examples. The author discusses the importance of data cleaning, highlighting its critical role in ensuring the reliability and usability of data. Furthermore, the post includes a response to a peer's post, providing constructive feedback and suggesting improvements to their analysis, demonstrating an understanding of data mining principles and the ability to apply them to practical scenarios. The post refers to data mining in manufacturing and aviation, and discusses software such as PRISM and Winpure.
Document Page
Running head: DATA MINING ASSIGNMENT_KEIL CRONAUER
DATA MINING ASSIGNMENT_KEIL CRONAUER
Name of Student
Name of University
Author Note
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1DATA MINING ASSIGNMENT_KEIL CRONAUER
Dear Keil,
I immensely enjoyed reading your post. The two examples you discussed were very
fascinating. Messy data being unstructured is by no means lacking in information, it is simply
like you explained in your example of daily completion report data, in a format which is
difficult to read and analyse. You correctly underlined the key requirement of how to make
use of this data, which is to come up with a way to segregate and arrange the information in a
way so as to reduce redundancy and only keep the relevant part in a structured way (Alekseev
et al., 2016). I had not known about the software PRISM. I will surely look into it the details
of its pros and cons later. Software of the same nature are now thankfully being made
available having too been deigned to suit the specific requirements of every field where
messy data is of importance, which caters to the different nature of the data. I had pointed out
how EMR is specialized software for dealing with the issue in Medical field (Singer et al.,
2015). Winpure , you may already know is yet another popular data cleaning program
(Wickham, 2014). noSQL is a database which is capable to storing in unstructured data and
accessing it in meaningful manner .(Mutz, Pemantle & Pham, 2017). You may want to
consider the specifics of this program for any possible applicability for your issues. Now,
aviation data I something that I found to be of particular interest. Though I do know how
important and crucial a role data has to play in the field, it was an eye-opener to read about
the nature of the data and the specifics of some of the issue.
Document Page
2DATA MINING ASSIGNMENT_KEIL CRONAUER
Reference
Alekseev, A. A., Osipova, V. V., Ivanov, M. A., Klimentov, A., Grigorieva, N. V., &
Nalamwar, H. S. (2016). Efficient data management tools for the heterogeneous big
data warehouse. Physics of Particles and Nuclei Letters, 13(5), 689-692.
Mutz, D. C., Pemantle, R., & Pham, P. (2017). Forthcoming in The American Statistician The
perils of balance testing in experimental design: Messy analyses of clean data.
Singer, J. M., André, C. D., Rocha, F. M., &Zerbini, T. (2015). Fitting non-linear mixed
models to messy longitudinal data: an example.
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.
Halloran, K. M., Murdoch, J. D., & Becker, M. S. (2015). Applying computeraided
photoidentification to messy datasets: a case study of T hornicroft's giraffe (G
iraffacamelopardalisthornicrofti). African Journal of Ecology, 53(2), 147-155.
chevron_up_icon
1 out of 3
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]