Review of Data Mining Concepts and Challenges: Week 8 Reflection

Verified

Added on 2023/06/08

AI Summary

This discussion post presents a student's reflection on their learning experience in a data mining course, specifically focusing on the concepts covered in Week 8. The student highlights their understanding of data mining applications in various industries, including aviation and healthcare, and emphasizes the importance of tools like decision trees, sequence mining, and logistic regression. The post delves into key topics such as clustering, data cleaning, and preparation, which the student found particularly relevant due to their wide-ranging applications. The student acknowledges the challenges faced, including confusion with R programming, switching between algorithms, and integrating redundant data. Other challenges include data quality issues, data availability, algorithm selection, and handling large datasets and heterogeneous databases. The post concludes with a plan to address the challenges, including revising theory, seeking help from teachers, and practicing R programming through case studies to improve practical skills in data mining.

Post: Week 8 Review

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

In the last few weeks, I have learnt some very important lessons regarding data mining. I
gained in-depth understanding about the applications of data mining in various industries
such as aviation industry, healthcare sectors and such others. Data mining is an important tool
which is used to search, collect, filter, and analyze the data. It is used by government, private,
large and small scale organizations. In the previous weeks, I learned concepts of data mining
such as decision tree, sequence mining, logistic regression, use of kernel machine and support
vector machine, clustering and outlier detection. The most important topics that I learnt in
class are clustering and data cleaning and preparation and even I focused more on these
topics due to their wide range applications in various industries. All these concepts were
recalled in the eighth week when I applied the data mining procedures to solve the case
studies (NPTEL, 2014).
I still wonder about the wide range of applications of data mining and the ease with which it
can handle tremendous amounts of data However, sometimes I get confused between the
steps of data mining. R programming is a typical tool used for data analysis and developing
statistical software. The sequence of commands, procedures and algorithms used in R
programming language confuses me sometimes. But a little more practice will clear my
doubts. Switching between the algorithms to solve different data mining problems also
creates problems (Romero & Ventura, 2013).
However, during the entire course, there were a few challenges faced by me. The biggest
challenge was integration of redundant or conflicting data obtained through various forms
and sources of information such as geo data, multimedia files, text, numeric data and social
data and such others. Sometimes the poor data quality such as dirty data, noisy data, incorrect
values and inexact values, inadequate size of data and the poor representation of data in data
sampling can lead to wrong collection and interpretation of data (Wideskills, 2018).
Sometimes the data which are used in data mining are not easily available. I also get caught

between which algorithm must be used at which place. Sometimes, the large data also creates
problems as the information extracted from it is not as expected. The approaches to handle
large sets of data sometimes do not give appropriate results. I also feel a little caught up when
I have to collect the information using heterogeneous databases (Crayon Data, 2015).
This assessment has helped me figure out the challenges I face while completing my
assignments related to data mining. I will use this information in correcting the flaws in my
practice schedule. I will revise the theory related to my weak areas and make them my
strength. I will approach my teachers in case I have doubts. The R programming language
needs to be practiced vigorously. Thus, I will solve as many case studies as possible to use
my theoretical knowledge of data mining and R programming in practical situations. I will
focus more on the basic concepts of data clustering and cleaning and preparation as they have
wide range of applications in the practical world and can give me an opportunity to work with
a leading organization in the world.
References
Crayon Data. (2015). 12 common problems in Data Mining. Retrieved from Bigdata-
madesimple.com: http://bigdata-madesimple.com/12-common-problems-in-data-
mining/
NPTEL. (2014). Data Mining. Retrieved 2018, from Onlinecourses.nptel.ac.in:
https://onlinecourses.nptel.ac.in/noc18_cs14/preview
Romero, C., & Ventura, S. (2013). Data mining in education. WIREs Data Mining Knowl
Discov, 3(1), 12-27.
Wideskills. (2018). Challenges in Data Mining. Retrieved from Wideskills.com:
http://www.wideskills.com/data-mining/challenges-in-data-mining