Data Quality Management - Importance, Dimensions, Standardization, and Challenges

Verified

Added on  2023/06/09

|9
|2252
|429
AI Summary
This article discusses the importance of high-quality information, dimensions of information quality, standardization, and challenges in data quality management. It also provides real-life examples to explain the concepts.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data quality management
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
2. Which one of the following is a reason why high-quality information is important?
Ans. All of the above. (d)
3. Which records pointing the same entity?
1. John Doe, 144 Main St., Little Rock, AR
2. John M. Doe, 268 Green Mountain Dr., Little Rock, AR
3. John Doe, 357 11 th St., Houston, Texas
4. J. Doe, 268 Green Mountain Dr., Little Rock, AR
Ans. None of the Above (d)
4. A data quality tool finds there are a number of null values in the State field of a
database. The problem corresponds to which one of the listed IQ dimension problem?
Ans. Completeness (a)
5. Profiling usually does NOT help uncover problems with
Ans. Values (content) (a)
6. For the Space Craft Columbia, certain analysis charts were omitted where problems
with the foam and pictures were not available for analysis. Which dimension of the
Information Quality is covering this problem?
Ans. Completeness (c)
For the following question, select the corresponding Information Quality Dimension
I. Accuracy
II. Consistency
III. Completeness
7. O-rings were misclassified and misreported
Ans. I and II (b)
Document Page
8. What is the Jaro score between Martha and Marhtas? Show the work.
1 >a < - ‘Martha’
2 >b < - ‘Marhtas’
3
4 A = 6 (length of a)
5 B = 6 (length of b)
6 m = 6 (number of shared symbols)
7 t = 1 (number of necessary transpositions
8
9 >d < - function (A,B,m,t) {
10 + 1 – (1/3)*(m/A + m/B + (m – t)/m);
11 + }
12
13 1 = 2 (num of symbols at beginning)
14
15 >jw<- function (A,B,m,t,l,p) {
16 + d(A,B,m,t) * (1 – 1 * p);
17 + }
18
19 >jw<- function (5, 6, 5, 1, 2, 0.1)
20 [1] 0.961
21
22 >core(a, b, method= ‘jw’, p=0.1)
23 [1] 0.961
It is important to use the Jaro score for the matching of MARTHA and MARHTA.
cs1 M A R T H A
matches 0 1 2 4 3 5
cs2 M A R H T A
Document Page
Jaro Proximity (MARTHA, MARHTA) = 1/3 * (6/6 + 6/6 + (6 - 1)/6) = 0.944
Three initial characters match, MAR, for a Jaro-Winkler distance of:
JaroWinkler Proximity (MARTHA, MARHTA) = 0.944 + 0.1 * 3 * (1.0 - 0.944) = 0.961
9. What is the bi-gram score between Hello and Hallo? Show the work.
1 >a < - ‘Hello’
2 >b < - ‘Hallo’
3
4 A = 5 (length of a)
5 B = 5 (length of b)
6 m = 4 (number of shared symbols)
7 t = 1 (number of necessary transpositions
8
9 >d < - function (A,B,m,t) {
10 + 1 – (1/3)*(m/A + m/B + (m – t)/m);
11 + }
12
13 1 = 2 (num of symbols at beginning)
14
15 >jw<- function (A,B,m,t,l,p) {
16 + d(A,B,m,t) * (1 – 1 * p);
17 + }
18
19 >jw<- function (5, 6, 5, 1, 2, 0.1)
20 [1] 1.41
21
22 >core(a, b, method= ‘jw’, p=0.1)
23 [1] 1.41
10. Do a search on Phonetic Matching Algorithms (PMA). List two problems/challenges
if a useropts for PMA instead of String Matching Algorithms.
Phonetics is a sound of vowels or consonants that take place with the pronunciation of a
letter. Algorithms can help in finding a step wise understanding of arrangements of such
letters. Phonetic Matching Algorithms is a novel way in which phonetics is used to check the
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
proper arrangement of letter in words or sentences, thereby shifting from the rule-based
algorithm which had strong language dependence, and risked lack of valid matches.
Two problems of PMA against String Matching Algorithms are:
It does not have the ability to understand different changes in phonetics regarding
vowels placement between consonants.
While it can provide valid matches, it fails to understand inflection and accents that
are provided in String Matching Algorithms.
11. Why do we need to standardize data? Explain briefly and provide a real-life
example.
Standardization means in order to rescale the data to have a mean of zero. Standardization is
also defined in terms of standard deviation of one. A z-score is called for a standardized
variable. It is often known as standard score. Standardization is important because generally
Model stability and parameter estimate precision are influenced during multivariate analysis
when multi-scaled variables are used. For example, in boundary detection, a variable that
ranges between 0 and 100 will outweigh a variable that ranges between 0 and 1. Using
variables without standardization can give variables with larger ranges greater importance in
the analysis. Transforming the data to comparable scales can prevent this problem (Sanders et
al, 2015). An example of such a process is Clinical Research Data where Standardization has
shown to modify the pattern of clinical research. The process is carried out via intense quality
research on data. Data quality management improves on the betterment of data integration
and reusability. It also facilitates of data exchange with partners and improves on the
increased use of software tools. This leads to improvements in team communication, team
management and facilitates the regulatory reviews and audits. Meredith Nahm, who is an
associate director for clinical research informatics at Duke Translational Medicine Institute,
also emphasized on the functionality of data sharing and the reusability of data. Hence, for
purposes other than those intended by the people who collected the data, thereby propounding
on the need for standardized data
12. Agree or disagree with the following statement and explain why: Poor-quality data
might affecta business’s accounting database but cannot affect something as advanced
as NASA’s SpaceShuttle Program.
Document Page
I disagree with the statement. A poor-quality data will affect am advanced databases like
NASA Space Shuttle Programme as acutely as it will affect a business accounting database.
NASA is a digitalized company and it relies on computer simulations and different forms of
data to make calculation and determine different kinds of space related activities. Any form
of data inaccuracy would bring out inaccurate data, and for NASA which deals with multiple
and large amounts of data, even a small inaccuracy would multiply subsequently resulting in
a large deviation from accurate results.
13. What is the issue of redundancy in a database from an information quality
perspective?Explain briefly and provide a real-life example.
Data redundancy is the repetition of superfluity of data in the database. It can happen for
various reasons. The problems with redundancy is that it could corrupt an accurate data.
Since such data is stored in the storage system, redundancy increases the size of the data. Due
to repetition, it can also cause data inconsistency thereby reducing the quality of the data
(Horn, 2016). A real-life example of such a problem can be found in the form of duplicate
data. This could happen in many ways. It is one of the most common forms of error. For
example any employee data is inserted in the record section of the department, then
automatically all the data of rest of the employees are also repeated and it can be count as
multiple records.
14. Agree or disagree with the following statement and explain why: Once data has been
declaredto be accurate, it will always be accurate.
I disagree with the above statement. While data may once be declared and therefore the
accuracy may be there, this is not a sure way to determine that the later data will always reach
the same level of accuracy and meet the same standards on the basis of one accurate data. It is
for this reason that there should be a proper data monitoring panel which will supervise the
data and make sure that the data continues to attain a certain accuracy and meet the required
standard repeatedly though routine checks and supervision.
15. Does sharing information across departments and organization cause information
quality problems or help improve information quality? Explain your answer.
Sharing information along a proper line with a specified purpose across departments will
improve information quality. This is because when an information is shared among different
departments, there is an automatic quality analysis that takes place in the individual
Document Page
departments which will help to modify, correct or better the data quality due to the automatic
audit process it undergoes. However, the departments with which the data is shared must be
relevant to the data process, or else the quality checks will be a waste.So there should be a
proper line along with the data should be shared within departments.
16. Agree or disagree with the following statement and explain why: Poor accuracy of
data can adversely affect decisions, but suspicions of low accuracy do not affect
decisions. Justify your answer.
I do not agree with this statement. While it is a fact that poor accuracy of data will adversely
affect the decisions, a suspicion regarding the accuracy of data may cause adverse problems
as well. This is because in the case of suspicion, the decisions will lack any form of
convictions and any form of decisions taken in a company should be without much suspicion.
Any suspicion lends a risk to the decision-making, and therefore, risk management process
must be attached to that decision which will have to modify decision-making process. Also,
there will be actions to audit and remove any form of suspicion. Hence, all these factors, has
a major impact on the decision-making process, and modifying such decision will bring in a
cost factor into the process.
17. What is the difference between data auditing vs. data monitoring?
Data Auditing Data Monitoring
Data auditing is a process of assessing the
quality of a data and its utility and
prescribed purpose
Data monitoring is a practice of routinely
supervising and checking the data so that it
maintains the required quality and meets the
required standards
Data Auditing is quality assurance Data monitoring is quality control (Bautista
Gomez & Cappello, 2014).
Various agencies and associations, such as
the Joint Information Systems Committee
(JISC), promote data audit protocols in
different fields (Akhtar & Iqbal, 2014).
Data monitoring is generally set on the
standards that are prescribed and maintained
by the company.
Data auditing generally occurs at the end of
the final data formation.
Data monitoring is a routine supervision of
the data
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Document Page
References
Akhtar, S., & Iqbal, J. (2014). An empirical analysis of pre and post merger or acquisition
impact on financial performance: a case study of Pakistan telecommunication
limited. European Journal of Accounting Auditing and Finance Research, 3(1), 69-80.
Bautista Gomez, L., & Cappello, F. (2014, February). Detecting silent data corruption
through data dynamic monitoring for scientific applications. In ACM SIGPLAN Notices (Vol.
49, No. 8, pp. 381-382). ACM.
Sanders, A., Childs, M., Traub, E., & Jones, J. (2015). An analysis of long term data
consistency and a proposal to standardize flower survey methods for the EISI pollinator
project.
Horn, R. L. (2016). U.S. Patent No. 9,268,657. Washington, DC: U.S. Patent and Trademark
Office.
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]