Report on Data Association Analysis in Computer Information System

Verified

Added on 2023/06/12

AI Summary

This report provides an analysis of data association within computer information systems, focusing on the concepts of equitability and maximal information coefficient (MIC). It critically examines the perspectives of various authors regarding the use of MIC for detecting novel associations in large datasets, particularly in comparison to mutual information. The report discusses the debate surrounding the mathematical formalization of equitability, the validity of claims made about MIC's performance, and the potential for misinterpretations. It also addresses sensible criticisms of MIC and explores its benefits and limitations in data mining. Furthermore, the report touches on the implications of recent research and the ongoing evolution of scientific understanding in this field. Finally, the report briefly assesses the practicality of using Google as a compression algorithm. Desklib offers a wealth of similar solved assignments and resources for students.

Running head: COMPUTER INFORMATION SYSTEM
Computer Information System
Name of the Student
Name of the University
Authors note

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1COMPUTER INFORMATION SYSTEM
Summary
Right among the three authors
The author of [3] is right as there are no clear evidence of definitive mathematical
formalization of heuristic property that is equitability to be satisfied with maximal information
coefficient, is provided by the author of [1].
False claims and over-selling the result
The claims presented by author of [1] are not actually false and over-selling as they have
provided performance of maximal information coefficient based on simulations done on large
scale. The author of [1] confidently described maximal information coefficient through
simulation proofs and said that satisfies heuristic property that is equitability. However, author of
[1] are not very accurate in their research as they also said in [4] that equitability R2 attainability
under limited noise models is still an open question that they had provided in [1]. They also said
that maximal information coefficient and equitability are two terms which will improve with
time and better methods will evolve. The author of [1] have not objected the claims of
mathematical definition of equitability proposed by author of [3].
This shows that they had introduced equitability but are not explaining its mathematical
formulation and they have to completely research on the topic. The author of [1] are not
providing true researches of equitability, dependence measure and maximal information
coefficient. The author of [5] describes that they had proved R2 equitability is not satisfied
through maximal information coefficient and any other dependence measure.
Purposefully misinterpretation and ignoring claims
The authors of [3] are not purposefully misinterpreting author of [1] and ignoring their
claims. This can be explained because they author of [3] are describing that equitability has

2COMPUTER INFORMATION SYSTEM
definition that is contradicted to the description proposed by author of [1]. The author of [3] have
understood the point of author of [1] in describing the relationships of large data sets through
measure of dependence of maximal information coefficient. The author of [3] are giving
complete illustration of what exactly equitability and how it can be proved with mathematical
formulation. The author of [3] introduced mutual information as a dependence measure for two
random variables. They describe through mathematical formulations that mutual information is
far better than maximal information coefficient in terms of high statistical power.
The mutual information is a natural way to provide equitability to quantify and determine
relationship of large datasets. However, they finally conclude in [3] that although mutual
information is useful in terms of theoretical advantage it is also limited due to unknown
properties of noise. The mutual information is better for theoretical purposes but it can also be
beneficial for practical purposes through use of certain software such as KNN estimator. The
author of [3] are not completely ignoring maximal information coefficient advantages for
dependence measure as they have analyzed maximal information coefficient and mutual
information through mathematical models. They are concluding in [3] that maximal information
coefficient requires prior testing of each datasets before its use for associations of large datasets.
The authors of [5] are demonstrating that their identification includes fundamental problems of
R2 equitability and hence they introduced self-equitability term which is satisfiable with mutual
information. The author of [5] proves in their paper that neither mutual information nor maximal
information coefficient is satisfiable in any mathematical sense with R2 equitability. Thus, author
of [5] are ignoring claims of author of [1] as they are analyzing the aspects of maximal
information coefficient and mutual information based on mathematical calculations with every
simulation taken into account.

3COMPUTER INFORMATION SYSTEM
Sensible criticism
The authors of [2] are presenting sensible criticism but without any brief explanation.
They are saying that maximal information coefficient has drawbacks in several important
situations.
They are comparing maximal information coefficient with Pearson distance correlation (dcor)
measure where they describe that maximal information coefficient has low power than dcor.
However, they are not describing the situations where the maximal information coefficient will
fall down and does not produce any practical approach to it.
Equitability and MIC
Therefore, the concept of equitability is a useful heuristic property for dependence
measure of association of large datasets. This is because equitability helps to determine the equal
noises generating from different types of sources [5]. The equitability is satisfied with maximal
information coefficient due to the requirement of maximal information coefficient to test each
binning schemes for each and every data sets prior to estimation of large datasets association.
The equitability is however not satisfiable with mutual information due to unknown properties of
noise, instead self-equitability is satisfiable [2]. The term equitability is thus useful if provided
with clear and accurate mathematical formulations. The maximal information coefficient is a
useful measure but with proper practical approach to different simulations will help to identify
and understand its implication on large datasets.
Data mining get benefitted or not
Data mining can be benefitted from equitability and maximal information coefficient up
to some extent where there are noise properties are defined and in some situations. The data

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4COMPUTER INFORMATION SYSTEM
mining can be benefitted but using other measures such as mutual information with self-
equitability conjointly can provide more theoretical and practical approach.
Latest pre-print of author [1]
The latest pre-print of author [1] in [6] are that it has considered equitability to describe
through interpretable interval. The equitability proposed by author of [1] in [1] were not
completely described using mathematical model and it was criticized by authors of [2] and [3].
The latest pre-print of author [1] in [6] show that through certain moderate assumptions,
equitability can be determined for trivial and non-trivial relationships and between non-trivial
relationships.
Exchange of letters
This exchange of letters tells that science is not a definite concept. It has evolved and still
evolving as per changes in environmental scenario. The process of science is not limited to one
concept rather it is approached through various concepts and authors has shown in the letters that
there researches are based on their perspectives. However, authors are contradicting each other
and criticizing which gives scope for people to understand the researched topics in every possible
ways. The general public are being informed about the topic however, they are getting confused.
The science magazine was not wrong at publishing [1], as it brought certain questions and
explanations by authors that were not explained by author of [1]. They should retract it as it was
not misleading but not properly explained and guided. It was acceptable for PNAS to publish [3]
as it provided better understanding of what the author of [1] has not explained. The [2] was not
as valuable as [3] and [5] but [6] was valuable to know how the author of [1] will explain the
term equitability.

5COMPUTER INFORMATION SYSTEM
A compression algorithm
The author of [17] proposes to use Google as compression algorithm, does not make any
sense as there are large number of searched words on Google and to use them for computing
there are several algorithms. This compression algorithm has several problems in transition and
thus using this algorithm will not provide any benefit rather it will add up for more complexity in
transition. There are several words that are searched on daily basis in Google and they all are
managed by different algorithms. The Kolmogorov complexity is defined in this paper where
compression algorithm is used to compute the lengths of the string. However, the compression
algorithm is not needed for computing lengths of the string as compression algorithm has issues
related to transition of string pairs.

6COMPUTER INFORMATION SYSTEM
References
[1] D. Reshef, Y. Reshef, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander,
M. Mitzenmacher and P. Sabeti, "Detecting Novel Associations in Large Data
Sets", Science, vol. 334, no. 6062, pp. 1518-1524, 2011.
[2] N. SIMON and R. TIBSHIRANI, ". COMMENT ON “DETECTING NOVEL
ASSOCIATIONS IN LARGE DATA SETS” BY RESHEF ET AL, SCIENCE DEC 16,
2011", arXiv preprint arXiv, vol. 1401, p. 7645, 2014.
[3] J. Kinney and G. Atwal, "Equitability, mutual information, and the maximal information
coefficient", Proceedings of the National Academy of Sciences, vol. 111, no. 9, pp. 3354-
3359, 2014.
[4] D. Reshef, Y. Reshef, M. Mitzenmacher and P. Sabeti, "Cleaning up the record on the
maximal information coefficient and equitability", Proceedings of the National Academy
of Sciences, vol. 111, no. 33, pp. E3362-E3363, 2014.
[5] J. Kinney and G. Atwal, "Reply to Reshef et al.: Falsifiability or bust", Proceedings of
the National Academy of Sciences, vol. 111, no. 33, pp. E3364-E3364, 2014.
[6] Y. Reshef, D. Reshef, P. Sabeti and M. Mitzenmacher, "Equitability, interval estimation,
and statistical power", arXiv preprint arXiv, vol. 1505, no. 02212, 2015.
[7] R. Cilibrasi and P. Vitanyi, "The Google Similarity Distance", IEEE Transactions on
Knowledge and Data Engineering, vol. 19, no. 3, pp. 370-383, 2007.

1 out of 7

Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Report on Data Association Analysis in Computer Information System

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

+13062052269

info@desklib.com