Data Snooping Research: Analysis, Bias, and Solutions (IMAT5238)

Verified

Added on  2022/08/13

|7
|574
|16
Discussion Board Post
AI Summary
This discussion board post provides an analysis of data snooping, a critical issue in data mining. It defines data snooping as the reuse of a single dataset for inferential purposes, which can lead to misleading results due to chance correlations. The post explores the occurrence of data snooping, particularly in non-experimental sciences and investment contexts, and emphasizes the dangers of repeatedly testing models on the same data. It suggests methods to avoid data snooping, such as strict data separation, higher data frequency, and the Bonferroni method. Furthermore, the post explains the concept of data snooping bias, highlighting factors like unmeasured confounders and missing factors. The importance of curve fitting and dividing data into in-sample and out-of-sample datasets for back-testing is also discussed. This post is intended to help students understand and avoid data snooping to ensure the integrity of their data mining research.
Document Page
N A M E
I N S T I T U T I O N
Data Snooping
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Introduction
Data snooping occurs when a single data set is utilized
more than once for inferential purposes( White, 2000).
If a data reuse takes place, possibilities of satisfactory
outcomes might be attained by chance other than inherent
merit within the approaches producing results.
The problem is unavoidable during time series analysis of
data.
It is broadly acknowledged by the empirical researchers
that data snooping is a practice which is dangerous and a
practice that must be avoided.
It lacks practical methods for assessing probable dangers
which might be experienced.
Document Page
Occurrence of data snooping
Occurs in most of non-experimental sciences.
In the investment context, data snooping makes it inevitable to
re-examine the historical data (White, 2000).
When recurrently attempting various models in explaining and
predicting data, the probability of getting one becomes
apparently closer to 100 percent even when such models are
utterly impractical.
Data reuse might take place both at individual researcher and
collective levels.
Researchers attempt looking for past researches done in order
to get notions of whatever work and that which doesn’t work
within the data.
Further improvements are the made on such data in order to
establish the one which is satisfactory.
Document Page
What can be done on data Snooping
Data snooping can be avoided-maintaining strict data
separation employed in developing hypotheses then
train models as of the used data in order to test them.
Higher data frequency might be exempted as at times
sufficient quantity of new data might be attained.
The other method is to account for data snooping is
Bonferroni method (White, 2000)
This is an appropriate in accounting for data
snooping as well as accounting for the whole trading
rules in the universe(Hsu,Han, Wu & Cao, 2018).
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data snooping bias
Bias refers to systematic error within an
analysis (Dichtl et al., 2019).
Bias comes due to refining several
parameters in improving performance of a
system of a single data.
The factors which might arise from the data
snooping include unmeasured confounders,
missing factors, and follow-up loss.
Curve fitting assist in avoiding data snooping.
Document Page
Conclusion
Data can be divided to use in the process of back-
testing into two samples.
In-sample. This is data sample which shall be
employed to back-test every combination
resulting from the rules of initial trading.
Out-of-sample. This is employed as an approach
of testing rules which are best performing on
the new data (Dichtl et al., 2019).
This acts like a filter in which rules that are not
performing get rejected. The performing rules
are accepted.
Document Page
References
Dichtl, H., Drobetz, W., Neuhierl, A., &
Wendt, V. S. (2019). Data Snooping in Equity
Premium Prediction. Available at SSRN
2972011.
Hsu, P. H., Han, Q., Wu, W., & Cao, Z. (2018).
Asset allocation strategies, data snooping,
and the 1/N rule. Journal of Banking &
Finance, 97, 257-269.
White, H. (2000). A reality check for data
snooping. Econometrica, 68(5), 1097-1126.
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]