Data Analysis for Disease Research
VerifiedAdded on  2020/10/22
|11
|1748
|253
AI Summary
The provided document is a solved assignment that demonstrates data analysis for disease research. It starts with an introduction, followed by a literature review of relevant studies on disease analysis. The report then presents a data analysis section, which includes descriptive statistics, ANOVA, and correlation analysis. Scatter diagrams are used to visualize the results, and conclusions are drawn based on the findings.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
DATA ANALYSIS
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Table of Contents
INTRODUCTION...........................................................................................................................1
Descriptive statistic......................................................................................................................1
ANOVA.......................................................................................................................................3
T-test............................................................................................................................................4
Correlation...................................................................................................................................5
CONCLUSION................................................................................................................................7
REFERENCES................................................................................................................................8
INTRODUCTION...........................................................................................................................1
Descriptive statistic......................................................................................................................1
ANOVA.......................................................................................................................................3
T-test............................................................................................................................................4
Correlation...................................................................................................................................5
CONCLUSION................................................................................................................................7
REFERENCES................................................................................................................................8
INTRODUCTION
Data analysis is one of the essential aspect for any kind of statistical analysis of the facts to
reach at the essential solution. Basically, it is an effective term used to analyse the data that can
assist company to achieve at certain meaningful results. The primary objective of conducting
statistical analysis is to provide basic information about variables in a given dataset. Apart from
this, potential relationship among two data set can easily be determine by the help of this
particular analysis. This research analysis used to focus on statistical tests on the hypothesis that
consists of facts, tables and figures. The overall analysis is based on various calculation by the
help of using correlation and ANOVA supported with relevant figures to examine the impacts on
the other variables.
Descriptive statistic
Injuries (DALYs lost)
Mean 3806411.972
Standard Error 428890.4457
Median 2349519.381
Mode #N/A
Standard Deviation 3860014.011
Sample Variance 1.49E+13
Kurtosis -1.533941453
Skewness 0.502392268
Range 9517166.563
Minimum 3452.657492
Maximum 9520619.22
Sum 308319369.7
Count 81
Non-communicable diseases (NCDs) (DALYs
lost)
Mean 24636200.97
Standard Error 3481456.299
Median 5810289.944
Mode #N/A
Standard Deviation 31333106.69
Sample Variance 9.82E+14
1
Data analysis is one of the essential aspect for any kind of statistical analysis of the facts to
reach at the essential solution. Basically, it is an effective term used to analyse the data that can
assist company to achieve at certain meaningful results. The primary objective of conducting
statistical analysis is to provide basic information about variables in a given dataset. Apart from
this, potential relationship among two data set can easily be determine by the help of this
particular analysis. This research analysis used to focus on statistical tests on the hypothesis that
consists of facts, tables and figures. The overall analysis is based on various calculation by the
help of using correlation and ANOVA supported with relevant figures to examine the impacts on
the other variables.
Descriptive statistic
Injuries (DALYs lost)
Mean 3806411.972
Standard Error 428890.4457
Median 2349519.381
Mode #N/A
Standard Deviation 3860014.011
Sample Variance 1.49E+13
Kurtosis -1.533941453
Skewness 0.502392268
Range 9517166.563
Minimum 3452.657492
Maximum 9520619.22
Sum 308319369.7
Count 81
Non-communicable diseases (NCDs) (DALYs
lost)
Mean 24636200.97
Standard Error 3481456.299
Median 5810289.944
Mode #N/A
Standard Deviation 31333106.69
Sample Variance 9.82E+14
1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Kurtosis -1.432106009
Skewness 0.73105033
Range 78556354.46
Minimum 20578.1445
Maximum 78576932.6
Sum 1995532278
Count 81
Communicable, maternal, neonatal, and nutritional
diseases (DALYs lost)
Mean 4357059.6
Standard Error 380914.21
Median 4672039.1
Mode #N/A
Standard Deviation 3428227.9
Sample Variance 1.18E+13
Kurtosis -1.4481559
Skewness -0.1562678
Range 9444026.7
Minimum 3082.9998
Maximum 9447109.7
Sum 352921825
Count 81
From the above descriptive analysis, measure of central tendency is taken into
consideration to measure the value of sample which is 81. In this results, the mean of injuries on
daily lost are 3806411.972 and in non-communicable diseases is 24636200.97, whereas under
the communicable disease it is taken as 4357059.6. Similarly, the mean value of injuries is
collected as 2349519.381, 5810289.944 and 4672039.1 respectively. The data is also Skewed to
the left which explain why the median value is greater than the mean value. Further this, the
standard deviation is 3860014.011, 31333106.69 and 3428227.9. By the normal data, most of the
observation are spread within 3 standard deviation on every side of the mean value.
Mean Standard deviation
Non-
communicable 24636200.97 31333106.7
communicable 4357059.567 3428227.86
injuries 3806411.972 3860014.01
2
Skewness 0.73105033
Range 78556354.46
Minimum 20578.1445
Maximum 78576932.6
Sum 1995532278
Count 81
Communicable, maternal, neonatal, and nutritional
diseases (DALYs lost)
Mean 4357059.6
Standard Error 380914.21
Median 4672039.1
Mode #N/A
Standard Deviation 3428227.9
Sample Variance 1.18E+13
Kurtosis -1.4481559
Skewness -0.1562678
Range 9444026.7
Minimum 3082.9998
Maximum 9447109.7
Sum 352921825
Count 81
From the above descriptive analysis, measure of central tendency is taken into
consideration to measure the value of sample which is 81. In this results, the mean of injuries on
daily lost are 3806411.972 and in non-communicable diseases is 24636200.97, whereas under
the communicable disease it is taken as 4357059.6. Similarly, the mean value of injuries is
collected as 2349519.381, 5810289.944 and 4672039.1 respectively. The data is also Skewed to
the left which explain why the median value is greater than the mean value. Further this, the
standard deviation is 3860014.011, 31333106.69 and 3428227.9. By the normal data, most of the
observation are spread within 3 standard deviation on every side of the mean value.
Mean Standard deviation
Non-
communicable 24636200.97 31333106.7
communicable 4357059.567 3428227.86
injuries 3806411.972 3860014.01
2
From the above column chart, it has been clearly determining that all the disease analysis is
present effectively on a single chart. In case of non-communicable sample observations are
24636200.97, communicable are 4357059.567 and overall injuries are 3806411.972 respectively.
After making proper analysis, it has been clearly seen that total percentage of people that are
suffering from non-communicable disease are higher as compare to other two options.
ANOVA
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Non-communicable diseases (NCDs)
(DALYs lost) 54 1994830654 36941308.41 1.02E+15
Communicable, maternal, neonatal, and
nutritional diseases (DALYs lost) 54 352827023 6533833.757 3.26E+12
Injuries (DALYs lost) 54 308220326 5707783.82 1.14E+13
ANOVA
Source of Variation SS df MS F f. crit
Between Groups 3.42E+16 2 1.71E+16 49.64509084 3.05289081
Within Groups 5.48E+16 159 3.45E+14
Total 8.90E+16 161
3
present effectively on a single chart. In case of non-communicable sample observations are
24636200.97, communicable are 4357059.567 and overall injuries are 3806411.972 respectively.
After making proper analysis, it has been clearly seen that total percentage of people that are
suffering from non-communicable disease are higher as compare to other two options.
ANOVA
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Non-communicable diseases (NCDs)
(DALYs lost) 54 1994830654 36941308.41 1.02E+15
Communicable, maternal, neonatal, and
nutritional diseases (DALYs lost) 54 352827023 6533833.757 3.26E+12
Injuries (DALYs lost) 54 308220326 5707783.82 1.14E+13
ANOVA
Source of Variation SS df MS F f. crit
Between Groups 3.42E+16 2 1.71E+16 49.64509084 3.05289081
Within Groups 5.48E+16 159 3.45E+14
Total 8.90E+16 161
3
According to the above data analysis of the single way among the groups that consists of
communicable, non-communicable and injuries. The sign values indicate the significant
difference among the given variables. This values would assist if group means were relatively
the similar or if they were significantly varying from one another. By the help of this, every
single effects on the variables are analyse more clearly like in the above table the sig. value is
3.059. As the value is greater than 0.5, it means that there are statistically significant variations
among the three groups. Sig. value can provide specific solution that mean value among the
groups is having certain differences. It could be all the conditions are significantly different from
one another. Although, it would determine the difference but not exactly which ones is the major
issues with this factors.
T-test
t-Test: Paired Two Sample for Means
Non-communicable
diseases (NCDs)
(DALYs lost)
Communicable, maternal,
neonatal, and nutritional
diseases (DALYs lost)
Mean 36941308.41 6533834
Variance 1.02E+15 3.26E+12
Observations 54 54
Pearson Correlation -0.928229397
Hypothesized Mean Difference 0
df 53
t Stat 6.649257379
P(T<=t) one-tail 8.27E-09
t Critical one-tail 1.674116237
P(T<=t) two-tail 1.65E-08
t Critical two-tail 2.005745995
Mean Standard deviation
Non-communicable
diseases 36941308 31333106.69
Communicable diseases 6533834 3428227.857
From the above calculation, it has been analyse t-test used to determine if the two group
are having the same or different amount of variability among the score the chances of getting
positive results can be more. A value which is greater than the .05 meand that the variability in
those two groups is considering the two groups simultanenously. From the above caluclation, it
has been analyse that signigicant difference aamong the communicable and non-communicable
4
communicable, non-communicable and injuries. The sign values indicate the significant
difference among the given variables. This values would assist if group means were relatively
the similar or if they were significantly varying from one another. By the help of this, every
single effects on the variables are analyse more clearly like in the above table the sig. value is
3.059. As the value is greater than 0.5, it means that there are statistically significant variations
among the three groups. Sig. value can provide specific solution that mean value among the
groups is having certain differences. It could be all the conditions are significantly different from
one another. Although, it would determine the difference but not exactly which ones is the major
issues with this factors.
T-test
t-Test: Paired Two Sample for Means
Non-communicable
diseases (NCDs)
(DALYs lost)
Communicable, maternal,
neonatal, and nutritional
diseases (DALYs lost)
Mean 36941308.41 6533834
Variance 1.02E+15 3.26E+12
Observations 54 54
Pearson Correlation -0.928229397
Hypothesized Mean Difference 0
df 53
t Stat 6.649257379
P(T<=t) one-tail 8.27E-09
t Critical one-tail 1.674116237
P(T<=t) two-tail 1.65E-08
t Critical two-tail 2.005745995
Mean Standard deviation
Non-communicable
diseases 36941308 31333106.69
Communicable diseases 6533834 3428227.857
From the above calculation, it has been analyse t-test used to determine if the two group
are having the same or different amount of variability among the score the chances of getting
positive results can be more. A value which is greater than the .05 meand that the variability in
those two groups is considering the two groups simultanenously. From the above caluclation, it
has been analyse that signigicant difference aamong the communicable and non-communicable
4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
disease is 1.6 which is more than .05 in case one tail test. Similarly, in case of two tailed test the
values is 2.005. By this analysis, it could be easily analyse that the results are showing no
significant difference among the two variables.
According to the above chart, it has been respresenting the two group of disease in the
form of communicable and non-communicable. It is clear from the chart that maximum number
of non-communicable disease are affecting the people health. 36941308 and 6533834 as mean
value of the total number of sample is provided in the above chart.
Correlation
It is known as one of the effective method which will be used to determine the
relationships between two or more variable at the same point of time. Below mentioned analysis
is based on the certain group of diseases which is categories according to their impacts.
ANOVA
df SS MS F
Significance
F
Regression 1 62.7807 62.7807 1.05287 0.30969
Residual 51 3041.03 59.6281
Total 52 3103.81
Coefficients
Standard
Error t Stat P-value
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 2005.874 2.772951 723.372 5.76E-104 2000.307 2011.441 2000.307 2011.441
6953934.2 -4.69E-07 4.57E-07 -1.0261 0.3096888 -1.39E- 4.48E-07 -1.39E- 4.48E-07
5
values is 2.005. By this analysis, it could be easily analyse that the results are showing no
significant difference among the two variables.
According to the above chart, it has been respresenting the two group of disease in the
form of communicable and non-communicable. It is clear from the chart that maximum number
of non-communicable disease are affecting the people health. 36941308 and 6533834 as mean
value of the total number of sample is provided in the above chart.
Correlation
It is known as one of the effective method which will be used to determine the
relationships between two or more variable at the same point of time. Below mentioned analysis
is based on the certain group of diseases which is categories according to their impacts.
ANOVA
df SS MS F
Significance
F
Regression 1 62.7807 62.7807 1.05287 0.30969
Residual 51 3041.03 59.6281
Total 52 3103.81
Coefficients
Standard
Error t Stat P-value
Lower
95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 2005.874 2.772951 723.372 5.76E-104 2000.307 2011.441 2000.307 2011.441
6953934.2 -4.69E-07 4.57E-07 -1.0261 0.3096888 -1.39E- 4.48E-07 -1.39E- 4.48E-07
5
06 06
Non-
communicable
diseases (NCDs)
(DALYs lost)
Communicable, maternal,
neonatal, and nutritional
diseases (DALYs lost)
Injuries (DALYs
lost)
Non-communicable
diseases (NCDs)
(DALYs lost) 1
Communicable,
maternal, neonatal, and
nutritional diseases
(DALYs lost) 0.1751408 1
Injuries (DALYs lost) 0.9770977 0.35245 1
From the above correlation effective analysis, it has been clearly seen that communicable
disease is negative relative with the another one. Because the value is -4.69 out of the total
sample size of 52. The strength of the relationship among two variables. If the concern variables
used to have effective relationships, it will be easy for them imagine a line connecting to all the
dots. The correlation among non-communicable disease is 1 as well as relation between
communicable and non-communicable is 0.175. The relationships between the communicable
maternal is 1 and with injuries is also 1.
From the above chart, it has been analyse that regression value of all those variable are
presented through using a scatter diagram. The one which is moving upperward direction is
6
Non-
communicable
diseases (NCDs)
(DALYs lost)
Communicable, maternal,
neonatal, and nutritional
diseases (DALYs lost)
Injuries (DALYs
lost)
Non-communicable
diseases (NCDs)
(DALYs lost) 1
Communicable,
maternal, neonatal, and
nutritional diseases
(DALYs lost) 0.1751408 1
Injuries (DALYs lost) 0.9770977 0.35245 1
From the above correlation effective analysis, it has been clearly seen that communicable
disease is negative relative with the another one. Because the value is -4.69 out of the total
sample size of 52. The strength of the relationship among two variables. If the concern variables
used to have effective relationships, it will be easy for them imagine a line connecting to all the
dots. The correlation among non-communicable disease is 1 as well as relation between
communicable and non-communicable is 0.175. The relationships between the communicable
maternal is 1 and with injuries is also 1.
From the above chart, it has been analyse that regression value of all those variable are
presented through using a scatter diagram. The one which is moving upperward direction is
6
indicating the results or impacts of non-communicable disease, while lower movement of the dot
lines are based on communicable disease.
CONCLUSION
From this particular report, it has been concluded that data analysis is crucial part of any
research. In accordance to this, various tools such as descriptive, ANOVA and correlation are
analyzed accordingly. To reach at valuable solution, certain charts are also being used so that
proper impacts on the data can easily determine. Overall analysis is sufficient enough to attain
more specific results and make effective decision on that basis. This will assist in overall growth
and examination of issues by provide accurate solution to the diseases.
7
lines are based on communicable disease.
CONCLUSION
From this particular report, it has been concluded that data analysis is crucial part of any
research. In accordance to this, various tools such as descriptive, ANOVA and correlation are
analyzed accordingly. To reach at valuable solution, certain charts are also being used so that
proper impacts on the data can easily determine. Overall analysis is sufficient enough to attain
more specific results and make effective decision on that basis. This will assist in overall growth
and examination of issues by provide accurate solution to the diseases.
7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
REFERENCES
Books and Journals:
Chambers, J. M., 2017. Graphical Methods for Data Analysis: 0. Chapman and Hall/CRC.
Gelman, A. and et. al., 2013. Bayesian data analysis. Chapman and Hall/CRC.
Miles, M. B., Huberman, A. M. and Saldana, J., 2013. Qualitative data analysis. Sage.
8
Books and Journals:
Chambers, J. M., 2017. Graphical Methods for Data Analysis: 0. Chapman and Hall/CRC.
Gelman, A. and et. al., 2013. Bayesian data analysis. Chapman and Hall/CRC.
Miles, M. B., Huberman, A. M. and Saldana, J., 2013. Qualitative data analysis. Sage.
8
1 out of 11
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
 +13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024  |  Zucol Services PVT LTD  |  All rights reserved.