Challenging Problems in Data Mining Research: A Comprehensive Analysis
VerifiedAdded on 2025/05/04
|7
|1667
|355
AI Summary
Desklib provides solved assignments and past papers to help students succeed.

Data Mining
Challenging Problems In Data Mining Research
AUTHORS
Qiang yang, xindong wo
Name
ID
P a g e 1 | 7
Name ID
Challenging Problems In Data Mining Research
AUTHORS
Qiang yang, xindong wo
Name
ID
P a g e 1 | 7
Name ID
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

CHALLENGING PROBLEMS IN DATA MINING RESEARCH
Introduction
According to calculations, the amount of data increases to dabble every year which shows how
many scientist, administration and company data is inundated by knowledge that are produced
and keep habitually, that produce into massive databases increasing to gigabytes and terabytes of
knowledge
these databases contain a goldmine of potentially valuable data, but it is far from the human
ability to search for vast amounts of knowledge and discover basic patterns.. Specified bound
information examination aim, it’s been a typical observe to either style an information
application at online information or analytical package conjunction with a site knowledgeable to
interpret the results. Notwithstanding one doesn't count the issues connected with the utilization
of normal applied statically correspondences like its restricted energy needed for discovery, the
requirements for domains as well as specialists to use applied statistic strategies and to improve
results . We are needed to state the aims related data so that we can reach that aim. But still
sturdy risk so that vital and substantive patterns within the information, waiting to be discovered,
are missed.
The necessary problems in datamining is said the degree of information, as a result
of several KD methods, including complete hunt for quick space. It is terribly profound to the
scale of knowledge with respect to interval, quality and causation dense shapes. (Jitender S.
Deogun, n.d.)
we are about to establish a number of difficult issues of datamining analysis, referring a
number to foremost energetic analysis in machine learning and datamining as of their views on
what are thought of necessary and earnest subjects for future analysis in datamining. We tend to
expect their visions can motivate new analysis hard work, and provides young researchers a
high-level guideline on wherever the top issues are set in data mining. This short article serves to
summarize the foremost difficult issues we've received from this survey. The pattern of register
doesn't replicate level of significance.
1. Mining Sequence Data and Time Series Data
P a g e 2 | 7
Name ID
Introduction
According to calculations, the amount of data increases to dabble every year which shows how
many scientist, administration and company data is inundated by knowledge that are produced
and keep habitually, that produce into massive databases increasing to gigabytes and terabytes of
knowledge
these databases contain a goldmine of potentially valuable data, but it is far from the human
ability to search for vast amounts of knowledge and discover basic patterns.. Specified bound
information examination aim, it’s been a typical observe to either style an information
application at online information or analytical package conjunction with a site knowledgeable to
interpret the results. Notwithstanding one doesn't count the issues connected with the utilization
of normal applied statically correspondences like its restricted energy needed for discovery, the
requirements for domains as well as specialists to use applied statistic strategies and to improve
results . We are needed to state the aims related data so that we can reach that aim. But still
sturdy risk so that vital and substantive patterns within the information, waiting to be discovered,
are missed.
The necessary problems in datamining is said the degree of information, as a result
of several KD methods, including complete hunt for quick space. It is terribly profound to the
scale of knowledge with respect to interval, quality and causation dense shapes. (Jitender S.
Deogun, n.d.)
we are about to establish a number of difficult issues of datamining analysis, referring a
number to foremost energetic analysis in machine learning and datamining as of their views on
what are thought of necessary and earnest subjects for future analysis in datamining. We tend to
expect their visions can motivate new analysis hard work, and provides young researchers a
high-level guideline on wherever the top issues are set in data mining. This short article serves to
summarize the foremost difficult issues we've received from this survey. The pattern of register
doesn't replicate level of significance.
1. Mining Sequence Data and Time Series Data
P a g e 2 | 7
Name ID

Consecutive time series datamining are very significant downside. Notwithstanding development
in different connected grounds, the way to with efficiency group, order and expect the trends of
those information continues to be a very significant exposed subject.
A mainly difficult drawback is that the clamor in time series information. It’s a very significant
exposed matter to challenge. Several time series used for forecasts are polluted by clamor,
creating it tough to try and do correct small-run and long-run guesses. (QIANG YANG, 2005)
The simplest process is by dividing the changing of constants into equivalent
dimension breaks as several as a operator- outlined range of breaks. A difference of that
methodology is that the use of Shannon's theory such as entropy theme determines the interval
boundaries by creating the overall gain of knowledge from the determined occurrences in every
interval equal, known as even data interval wuantization methodology. The apparent
disadvantage of those forms of actions is that there could also be an outsized quantity of data
loss, as a result of the cut points wouldn't necessarily get on boundaries of predefined categories.
(Jitender S. Deogun, n.d.)
2. Data Mining in a Network Setting
These days world has been connected through numerous varieties of connections. Connections
including sites, emails as well as writing. Several defendants think about society mining and
therefore the mining of social networks as vital topics. Social structures are vital belongings of
communal networks. ID drawback could be a difficult one. In beginning vital to own the correct
description idea of public that is to be noticed. The other facts is objects concerned are
distributed in actual life requests, and thus spread and suggests that of ID are desired. And the
last photograph grounded data set might not catch important theme and most significant depends
within the native relations.
Below these conditions, our task is to know
(1) Network’s stationary constructions (Typologies and clusters)
(2) Energetic performance (growing issues and practical competence) (QIANG YANG, 2005)
3. Null Values
P a g e 3 | 7
Name ID
in different connected grounds, the way to with efficiency group, order and expect the trends of
those information continues to be a very significant exposed subject.
A mainly difficult drawback is that the clamor in time series information. It’s a very significant
exposed matter to challenge. Several time series used for forecasts are polluted by clamor,
creating it tough to try and do correct small-run and long-run guesses. (QIANG YANG, 2005)
The simplest process is by dividing the changing of constants into equivalent
dimension breaks as several as a operator- outlined range of breaks. A difference of that
methodology is that the use of Shannon's theory such as entropy theme determines the interval
boundaries by creating the overall gain of knowledge from the determined occurrences in every
interval equal, known as even data interval wuantization methodology. The apparent
disadvantage of those forms of actions is that there could also be an outsized quantity of data
loss, as a result of the cut points wouldn't necessarily get on boundaries of predefined categories.
(Jitender S. Deogun, n.d.)
2. Data Mining in a Network Setting
These days world has been connected through numerous varieties of connections. Connections
including sites, emails as well as writing. Several defendants think about society mining and
therefore the mining of social networks as vital topics. Social structures are vital belongings of
communal networks. ID drawback could be a difficult one. In beginning vital to own the correct
description idea of public that is to be noticed. The other facts is objects concerned are
distributed in actual life requests, and thus spread and suggests that of ID are desired. And the
last photograph grounded data set might not catch important theme and most significant depends
within the native relations.
Below these conditions, our task is to know
(1) Network’s stationary constructions (Typologies and clusters)
(2) Energetic performance (growing issues and practical competence) (QIANG YANG, 2005)
3. Null Values
P a g e 3 | 7
Name ID
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

A null or no value in DBMS, also referred to as lost data could seem because the data at any
quality that’s not a region of first significance and is preserved as representation, distinct from
the extra symbol, as well as different amounts zero data. The null worth doesn't solely can be an
unidentified value, however could be irrelevant. Relational databases that drawback happens in
this type of data bases oftentimes as a result of the relative typical dedicates that each one tuples
in very relation should have a similar variety of attributes, although standards of some are
irrelevant for a few tuples.
Observing the approach to increase electronic record service modeling unsure and inexact data,
wherever the normal null price is divided into 3 cases like unsuitable. Aside from this work, that
doesn't supply any answer for existing knowledge, we've got not encounter on work to deal with
it.
Although there are some previous studies. Once the info covers lost attribute data, either the data
is discarded or an endeavor is created to switch them with the foremost possible data. These ere
use to be concepts used for inductive decision trees. It had been prompt to build rules that guess
the worth of the absent values, supported the data of different attributes. (Jitender S. Deogun,
n.d.)
4. Dynamic Data
Most online databases characterizes as dynamic because their data is changing every time. This
characteristic implies significant effects for data discovery techniques; one of them is KD. Run-
time strength of the data discovery technique under KD-model and its use of recovery DBMS
tasks are the vital factors in context if data-discovery model is used as an gen-application. This is
because data discovery KD techniques are decisively read-only strategies for elongated
connections (Jitender S. Deogun, n.d.)
5. Speedy Mining of Streams
Net-mining creates key encounters. Net-links are accumulating in speed and hence net-link
suppliers are now installing one and ten gig LAN speeds. In order to observe irregularities, net-
link suppliers can somehow be able to get IP packets at high-link velocities along with the
analysis of data on daily basis. This will provide extremely ascendable solutions. In addition to
that, appropriable algorithms acquired to see whether Denial-of-Service attacks exist or not.
P a g e 4 | 7
Name ID
quality that’s not a region of first significance and is preserved as representation, distinct from
the extra symbol, as well as different amounts zero data. The null worth doesn't solely can be an
unidentified value, however could be irrelevant. Relational databases that drawback happens in
this type of data bases oftentimes as a result of the relative typical dedicates that each one tuples
in very relation should have a similar variety of attributes, although standards of some are
irrelevant for a few tuples.
Observing the approach to increase electronic record service modeling unsure and inexact data,
wherever the normal null price is divided into 3 cases like unsuitable. Aside from this work, that
doesn't supply any answer for existing knowledge, we've got not encounter on work to deal with
it.
Although there are some previous studies. Once the info covers lost attribute data, either the data
is discarded or an endeavor is created to switch them with the foremost possible data. These ere
use to be concepts used for inductive decision trees. It had been prompt to build rules that guess
the worth of the absent values, supported the data of different attributes. (Jitender S. Deogun,
n.d.)
4. Dynamic Data
Most online databases characterizes as dynamic because their data is changing every time. This
characteristic implies significant effects for data discovery techniques; one of them is KD. Run-
time strength of the data discovery technique under KD-model and its use of recovery DBMS
tasks are the vital factors in context if data-discovery model is used as an gen-application. This is
because data discovery KD techniques are decisively read-only strategies for elongated
connections (Jitender S. Deogun, n.d.)
5. Speedy Mining of Streams
Net-mining creates key encounters. Net-links are accumulating in speed and hence net-link
suppliers are now installing one and ten gig LAN speeds. In order to observe irregularities, net-
link suppliers can somehow be able to get IP packets at high-link velocities along with the
analysis of data on daily basis. This will provide extremely ascendable solutions. In addition to
that, appropriable algorithms acquired to see whether Denial-of-Service attacks exist or not.
P a g e 4 | 7
Name ID
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Once attack is being identified, it is important to differentiate valid-traffic with attacker’s traffic.
To analyze that its achievable to detect or not, we need following techniques to be assembled and
applied;
Observe Daniel-of-Service attacks.
Trace out attackers.
Drop attacker’s traffic packets (QIANG YANG, 2005).
6. Privacy, Security, and Integrity of Data
Researchers believe that privacy must be protected in data mining and they consider it as a
significant topic to be researched about. Data processing is a way to secure data from being
attacked and also well-extracted. Data Processing is a technique used to provide defense against
security and privacy. One researcher also states that if we cannot solve the issue of privacy for
public, then there is no way to state data mining’s advantages.
If we access data-integrity challenges, two key issues are discussed below as;
Efficient algorithms needs to be developed to compare data before and after security
clearance.
Impact estimation algorithms must be developed which can bound the possible
modifications made in data based upon discrete-patterns and procurable by extensiveness
of data-mining algorithms.
Based upon above discussion, the first challenge is to meet the security criteria by making
efficient algorithms and data-structures in order to analyze data’s validity. The second encounter
must be about developing algorithms which impact the variations of data on discrete patterns,
although it would be not-possible to implement these kind of extensively efficient algorithms
worldwide (QIANG YANG, 2005).
Concluding Author’s Remarks
Data mining attains techniques that could provide algorithms for rationalizing and understanding
extensive data. Matheus states that for evaluating KDD system, there must be a significant
relation between resourcefulness and autonomy. According to their assumption, KDD system
P a g e 5 | 7
Name ID
To analyze that its achievable to detect or not, we need following techniques to be assembled and
applied;
Observe Daniel-of-Service attacks.
Trace out attackers.
Drop attacker’s traffic packets (QIANG YANG, 2005).
6. Privacy, Security, and Integrity of Data
Researchers believe that privacy must be protected in data mining and they consider it as a
significant topic to be researched about. Data processing is a way to secure data from being
attacked and also well-extracted. Data Processing is a technique used to provide defense against
security and privacy. One researcher also states that if we cannot solve the issue of privacy for
public, then there is no way to state data mining’s advantages.
If we access data-integrity challenges, two key issues are discussed below as;
Efficient algorithms needs to be developed to compare data before and after security
clearance.
Impact estimation algorithms must be developed which can bound the possible
modifications made in data based upon discrete-patterns and procurable by extensiveness
of data-mining algorithms.
Based upon above discussion, the first challenge is to meet the security criteria by making
efficient algorithms and data-structures in order to analyze data’s validity. The second encounter
must be about developing algorithms which impact the variations of data on discrete patterns,
although it would be not-possible to implement these kind of extensively efficient algorithms
worldwide (QIANG YANG, 2005).
Concluding Author’s Remarks
Data mining attains techniques that could provide algorithms for rationalizing and understanding
extensive data. Matheus states that for evaluating KDD system, there must be a significant
relation between resourcefulness and autonomy. According to their assumption, KDD system
P a g e 5 | 7
Name ID

can handle several domains along with the data-discovery. An ideal KDD algorithm can handle
data autonomously and mechanical data handling is being progressed to guide and control data
within data-discovery system although ideal conditions cannot be met. Morer’s analysis suggest
two key features; one is to derive data from databases while other is to domain data and
knowledge in a consistent manner.
As far as number of pattern mining techniques discussed in class concerned, we believe that data
mining is the most crucial technique to derive theoretical observations from uneven set-theory. In
this way it assist us to understand its pros and cons easily. As the knowledge of artificial
intelligence is becoming more powerful now-a-days, and the one who has the knowledge is the
powerful one, so we need to work really hard to gain enough knowledge regarding data
discovery techniques.
Conclusion
As data mining has achieved a great success in the field of AI and IT world. A lot of new
concerns have also arisen which data-mining researchers are trying to mitigate. There is a lack of
timely communication between researchers and community people regarding data mining
techniques. This gap creates more negativity towards use of AI. This report summarize the
survey paper which highlights and analyze the most crucial problems related with data-mining.
These topics will always remain important to discuss concerning data-mining also. In this
context, researchers need to cooperate with IT industries in order to maintain the bigger picture
clear.
P a g e 6 | 7
Name ID
data autonomously and mechanical data handling is being progressed to guide and control data
within data-discovery system although ideal conditions cannot be met. Morer’s analysis suggest
two key features; one is to derive data from databases while other is to domain data and
knowledge in a consistent manner.
As far as number of pattern mining techniques discussed in class concerned, we believe that data
mining is the most crucial technique to derive theoretical observations from uneven set-theory. In
this way it assist us to understand its pros and cons easily. As the knowledge of artificial
intelligence is becoming more powerful now-a-days, and the one who has the knowledge is the
powerful one, so we need to work really hard to gain enough knowledge regarding data
discovery techniques.
Conclusion
As data mining has achieved a great success in the field of AI and IT world. A lot of new
concerns have also arisen which data-mining researchers are trying to mitigate. There is a lack of
timely communication between researchers and community people regarding data mining
techniques. This gap creates more negativity towards use of AI. This report summarize the
survey paper which highlights and analyze the most crucial problems related with data-mining.
These topics will always remain important to discuss concerning data-mining also. In this
context, researchers need to cooperate with IT industries in order to maintain the bigger picture
clear.
P a g e 6 | 7
Name ID
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

References
Jitender S. Deogun, V. V. R. a. H. S., n.d. Data Mining: Research-Trends, Challenges, and
Applications, USA: s.n.
QIANG YANG, X. W., 2005. 10 challanging-Problems-in-Data-Minning-Research.
International Journal of Information Technology & Decision Making, p. 08.
P a g e 7 | 7
Name ID
Jitender S. Deogun, V. V. R. a. H. S., n.d. Data Mining: Research-Trends, Challenges, and
Applications, USA: s.n.
QIANG YANG, X. W., 2005. 10 challanging-Problems-in-Data-Minning-Research.
International Journal of Information Technology & Decision Making, p. 08.
P a g e 7 | 7
Name ID
1 out of 7
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.