Psychometric Data Analysis Report: Addressing Missing Data Challenges

Verified

Added on 2022/10/15

AI Summary

This report addresses the critical issue of missing data in psychometric analysis, a problem influenced by the amount, pattern, and reason for missingness. It distinguishes between random and non-random missing data, emphasizing the potential for bias in the latter. The report explores several methods for handling missing data, including case deletion, mean imputation, and regression estimation, highlighting their advantages and disadvantages. It also covers more sophisticated techniques like the Expectation Maximization (EM) algorithm and multiple imputation methods. The analysis stresses the appropriateness of simpler methods for small, random missing data sets, while advocating for EM-based techniques and correlation matrices for larger, non-random datasets. The report concludes by emphasizing the importance of proactive data collection strategies to minimize non-random missing data, recommending the identification of high-risk subjects and allocation of resources accordingly. The references used are Barladi, & Enders, 2010, Eriksson, & Kovalainen, 2015, Flick, 2015, Howell, n.d., and Tabachnick & Fidell, 2013.

PSYCHOMETRIC
STUDENT ID:
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

PYSCHOMETRIC
A key issue with regards to data analysis relates to missing data. The seriousness of
this problem and the underlying methods used to resolve this issue is driven by the amount of
missing data, underlying pattern in missing data and the reason behind the same. If the
missing data is random, then the issue is not that serious. However, if the missing data is non-
random, then irrespective of the amount of data missing, the issue is serious. As a result,
attempts should be made during data collection to avoid non-random missing data as the
sample data could become biased (Flick, 2015).
One method to deal with missing data is to delete the cases with missing values. If
the missing values are present only in small number of subjects and in variables that are not
very relevant to researcher, then this is an appropriate technique. However, if the missing
values are pervasive throughout the dataset and the different variables, then deletion of cases
could lead to significant reduction in number of subjects (Howell, n.d.). In case of
experimental research, this approach would be particularly problematic as subject selection
and matching are immensely time consuming. Hence, if the missing values are high in
number, then this approach would bring distortion in data and thereby not preferred
(Tabachnick & Fidell, 2013).
The alternative approach is to estimate the missing values. This approach has several
methods with the common objective of accurate estimation of the missing values in the
dataset. One of the most common approaches is to fill the missing data with prior
knowledge. This is a reasonable procedure only if the researcher has vast prior knowledge
and experience in the underlying field and also the missing values count is small. If the prior
knowledge about the underlying variable or subject is lacking, then this is not a
recommended method as it could lead to bias (Barladi, & Enders, 2010). Another approach
for estimation of missing values is to replace the same with the mean value. Key benefit of
this method is the mean of the concerned variable would not alter and the method is
2

PYSCHOMETRIC
conservative. However, impact on standard deviation can be significant especially if the
quantum of missing data is large (Tabachnick & Fidell, 2013).
A more sophisticated technique for estimation of missing values is regression
estimation. In this technique, a regression equation is obtained for the variable with missing
value as the dependent variable and another variable as the independent variable. The missing
values of the dependent variable can be obtained by plugging the value of corresponding
independent variable. A key advantage of this method is the underlying objectivity since it
does not involve any guess on the part of researcher (Howell, n.d.). However, a potential
issue with this method is that it requires availability of reliable independent variables for the
concerned dependent variable which may not be readily available. Also, the value obtained
through regression model would be close to the average value and hence is not necessarily an
indicator of the real value (Tabachnick & Fidell, 2013).
With regards to randomly missing values, a popular method available is Expectation
Maximisation or EM. This technique is based on formation of a correlation or covariance
matrix for the provided dataset based on assumption of a particular distribution being
exhibited. Based on the likelihood of values under the assumed distribution, the missing
values are estimated. A key issue with this technique is that the imputed data set obtained
would be biased as the error is ignored (Howell, n.d.). Despite the shortcoming, there are
several data packages such as IBM SPSS MVA, NORM, SAS and SOLAS MBA which offer
estimation of missing values based on EM technique. Multiple imputations methods are also
used to improve the estimates and offer various advantages. The multiple imputation method
can be applied in case of longitudinal data and also does not make an assumption that the
missing values are random. Finally, a missing data correlation matrix can also be used for
estimation of missing values in certain selected cases where the software package does offer
this as an option (Tabachnick & Fidell, 2013).
3

PYSCHOMETRIC
From the above analysis, the key learning is that simpler methods such as case
deletion, replacement of missing value by researcher, substituting by mean are acceptable
when only the quantum of missing data is small and primarily random in nature. However,
for larger missing values especially with non-random type, EM imputation based techniques
along with the missing data correlation matrix would offer more reliable estimates of the
missing data (Tabachnick & Fidell, 2013).
Further, the data collection personnel should be always wary of non-random missing
values. In order to address this at the collection level, it makes sense to identify the high risk
subjects based on empirical studies. For instance, poor people are less likely to fill in their
income. Similarly, people with limited education qualification are likely to skip questions
regarding education qualification. The personnel involved in data collection should take
special care of ensuring that enough subjects belonging to a particular group provide data so
that some missing values can be dealt with. These concerns need to be incorporated while
framing the strategy and allocating resources in the data collection process (Eriksson, &
Kovalainen, 2015).
4

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

PYSCHOMETRIC
References
Barladi, A. N. &s; Enders, C. K. (2010), An introduction to modern missing data
analyses. Journal of School Psychology, 48, 5-37
Eriksson, P. & Kovalainen, A. (2015). Quantitative methods in business research (3rd ed.).
London: Sage Publications.
Flick, U. (2015). Introducing research methodology: A beginner's guide to doing a research
project (4thed.). NY: Sage Publications.
Howell, D.C.(n.d.) Treatment of Missing Data--Part 1, Retrieved from
https://www.uvm.edu/~dhowell/StatPages/Missing_Data/Missing.html
Tabachnick, B.G., & Fidell, L.S. (2013), Using multivariate statistics (6th ed.). Upper Saddle
River, NJ.:Pearson.
5