Biostatistics Assignment: DENT70001 Statistical Analysis and Report

Verified

Added on  2022/12/30

|15
|3217
|3
Homework Assignment
AI Summary
This biostatistics assignment analyzes survey data collected from PG dentistry students in the DENT70001 course. The assignment begins with data familiarization, including identifying the number of students enrolled and responding to the surveys, and creating a complete dataset. It then delves into presenting the data, identifying variable types, suggesting suitable summary statistics and plots. Descriptive statistics, stratified by gender, are presented to describe exercise patterns and attitudes toward statistics. The assignment further explores students' attitudes toward using statistics through cross-tabulation and statistical tests, including the proportion of students likely to use statistics in their careers and any changes in proportions between surveys. The analysis then examines the impact of the course on student exercise regimes, using histograms, boxplots, and descriptive statistics to assess changes in exercise levels before and after the course, and a paired t-test to test for significant differences. The assignment includes the interpretation of statistical results and the drawing of conclusions based on the data analysis, including the impact of the DENT70001 course on students' exercise patterns and their perceptions of statistics.
Document Page
Biostatistics
Student Name:
Instructor Name:
Course Number:
7 May 2019
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
[Question 1] Familiarity with the data [7]
(1.1) How many students have been enrolled into the course? How many students responded to
the first wave of the questionnaire collected in the first F2F session? How many students
responded to the second wave of data collection? [3]
Answer
From the data we can see that 40 students had been enrolled into the course. Out of the 40
students, 28 students responded to the first wave of the questionnaire collected in the first
F2F session.
Table 1: Count on LikeStat1
Row Labels Count of LikeStat1
1 1
2 5
3 12
4 6
5 4
NA 12
Grand Total 40
Out of the 40 students, 29 students responded to the second wave of data collection.
Table 2: Count on LikeStat2
Row Labels Count of LikeStat2
1 1
2 3
3 17
4 7
5 1
NA 11
Grand Total 40
(1.2) Create a subset dataset, which contains only those students who had provided answers to
both waves of the survey (1st and 2nd F2F Session). Name this new dataset
“COMPLETE”. How many students’ records are there in this new dataset? How many
variables are there in this new COMPLETE dataset?
Answer
There are 22 student records in this new dataset.
Document Page
The number of variables remain to be the same. That is, there are still 10 variables in this
new COMPLETE dataset. The variables are listed below;
Table 3: New variable list
1 ID
2 Gender1
3 LikeStat1
4 ConfStat1
5 Exercise1
6 UseStatJob1
7 LikeStat2
8 ConfStat2
9 Exercise2
10 UseStatJob2
[Question 2] Presenting the data [22]
(2.1) For the list of variables collected from the first F2F: “Gender1" "LikeStat1" “ConfStat1”
"Exercise1" "UseStatJob1” [10]
(2.1.1) Identify each of these variables’ data types (e.g. Qualitative/Quantitative; Discrete
/Categorical /Continuous; Binary/Ordinal/Nominal
etc.).
Answer
Description of the variables.
Table 4: Variable description
Variable label Variable Type Variable class Measurement
Gender1 Qualitative Categorical Nominal
LikeStat1 Quantitative Discrete Ordinal
ConfStat1 Quantitative Discrete Ordinal
Exercise1 Quantitative Continuous Ratio
UseStatJob1 Qualitative Categorical Binary
(2.1.2) State the most suitable summary statistics to explore each of these variables.
Answer
The following table presents the variables and the most suitable summary
statistics to explore each of the variables.
Document Page
Table 5: Suitable summary statistics table
Variable label Suitable Summary Statistics
Gender1 Frequencies/percentages
LikeStat1 Bar chart
ConfStat1 Bar chart
Exercise1 Histogram
UseStatJob1 Frequencies/percentages
(2.1.3) Identify the most appropriate potential plots/figures to explore each of these
variables.
Answer
The following table presents the variables and the most appropriate potential
plots/figures to explore each of the variables.
Table 6: Suitable plot table
Variable label Suitable Plot
Gender1 Pie chart
LikeStat1 Mean/median
ConfStat1 Mean/median
Exercise1 Mean/median
UseStatJob1 Pie chart
(2.2) Design a table with relevant summary/descriptive statistics, stratified by gender, to
describe the current cohort of PG dentistry students’ exercise pattern and their attitude
towards statistics at the beginning of the course. This table should contain the variables:
“Gender1", "LikeStat1", "Exercise1", "UseStatJob1”, “ConfStat1”. Write a paragraph (no
more than 250 words) to describe all these variables and any findings from this table. [12]
Answer
Table 7: Summary statistics
Frequency (n) Percent (%)
Gender
Female 10 45.5
Male 12 54.5
Total 22 100.0
I like statistics
Strongly Agree 1 4.5
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Agree 4 18.2
Neutral 9 40.9
Disagree 5 22.7
Strongly Disagree 3 13.6
Total 22 100.0
I am likely to use statistics in my career
Yes 13 59.1
No 9 40.9
Total 22 100.0
Mean Median Mode
I am confident in using statistics 29.40909 22.5 10
Exercise per week 57.04545 2.5 0
From table 7 above, we can observe that in terms of gender, majority (54.5%, n = 12) of
the respondents were male while 45.5% (n = 10) were females. Most of the respondents
(40.9%, n = 9) were neutral on whether they like statistics with only 22.7% (n = 5) saying
to either agree or strongly agree that they like statistics. 36.3% (n = 8) of the respondents
said to either disagree or strongly disagree with the statement that they like statistics.
However, 59.1% (n = 13) are in agreement that they are likely to use statistics in their
career. In terms of the confidence in using statistics, the average score was 29.41 (on a
scale of 0 to 100), with the median score being 22.5 and mode being 10. Lastly, the
average exercise per week was 57.05 minutes with a median of 2 minutes and mode of 0
minutes per week.
[Question 3] [18]
This question focuses on the students’ attitude towards using Statistics. We examine the
variables “UseStatJob1” and “UseStatJob2”.
(3.1) Cross-tabulate the variables “UseStatJob1” and “UseStatJob2”. Include the table in your
report.
Answer
Table 8: Cross tabulation table
UseStatJob1 UseStatJob2 Grand Total
No Yes
No 6 3 9
Yes 4 9 13
Document Page
Grand Total 10 12 22
(3.2) What is the proportion of PG students who report that they are likely to use statistics in
their job in the first survey? What is this proportion in the second survey? What are the
changes in proportions of who would use statistics between the two surveys?
Answer
Proportion∈the first survey= 13
22 =0.5909
Proportion∈the second survey= 12
22 =0.5455
Changes∈the proportion=0.5909−0.5455=0.0454
(3.3) Conduct a statistical test to answer the question: Is there sufficient evidence that PG
students’ had changed their opinions in using statistics in their future career, before and
after they attend the DENT70001 course? Write down the null and alternative
hypotheses, conduct the analysis, report the estimates, and write a paragraph regarding
your interpretation.
Answer
In this section, we sought to test whether there is sufficient evidence that PG students’
had changed their opinions in using statistics in their future career, before and after they
attend the DENT70001 course. The following hypothesis was tested;
Null hypothesis (H0): There is no significant difference in the proportion of those who
said yes they are likely to use statistics in their career before and after they attend the
DENT70001 course.
Alternative hypothesis (HA): There is significant difference in the proportion of those
who said yes they are likely to use statistics in their career before and after they attend the
DENT70001 course.
This is tested at 5% level of significance.
The results of the test are presented in the table below;
UseStatJob1 UseStatJob2 Difference
Document Page
Sample proportion 0.5909 0.5455 0.0454
95% CI (asymptotic) 0.3854 - 0.7964 0.3374 - 0.7536 -0.2473 - 0.3381
z-value 0.3
P-value 0.7611
Interpretation
Not significant,
accept null hypothesis that
sample proportions are equal
n by pi n * pi >5, test ok
From the above analysis, it can be seen that the p-value is 0.761 (a value greater than 5%
level of significance), we therefore fail to reject the null hypothesis and conclude that
there is no significant difference in the proportion of those who said yes they are likely to
use statistics in their career before and after they attend the DENT70001 course.
[Question 4]
This question concerns the level of exercise (period of time) that Manchester PG Dentistry
students have conducted per week. We examine the variables “Exercise1” and “Exercise2”.
(4.1) For each of these 2 variables “Exercise1”, “Exercise2”: [15]
(4.1.1) Produce two types of appropriate figures/plots for these variables. [4]
Answer
Figure 1: Histogram for Exercise 1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 2: Histogram for Exercise 2
(4.1.2) Calculate summary statistics. [4]
Answer
Table 9: Descriptive Statistics
Exercise2 Exercise1
N Valid 22 22
Missing 0 0
Mean 75.2273 57.0455
Median 60.0000 2.5000
Mode 60.00 .00
Std. Deviation 67.97194 110.99827
Variance 4620.184 12320.617
Skewness 2.060 2.070
Std. Error of Skewness .491 .491
Kurtosis 5.164 3.008
Std. Error of Kurtosis .953 .953
Range 290.00 360.00
Minimum 10.00 .00
Maximum 300.00 360.00
Percentiles 25 28.7500 .0000
50 60.0000 2.5000
Document Page
75 90.0000 65.0000
(4.1.3) Describe the shape of the distribution for each of these two variables. [3]
Answer
Based on the shape of the histogram and the skewness values for both the two variables
(Exercise1 and Exercise2), we can conclude that the distribution for the two variables are
not following a normal distribution but are rather skewed. Both the distributions for the
two variables are right skewed (longer tail to the right).
(4.1.4) Write a paragraph (no more than 200 words) to summarize UoM PG Dentistry students
exercise level Before and After the DENT70001 course started, using the above
information.
Answer
From table 9 above on the descriptive statistics, we can see that there was a great
improvement in the exercise level of the Dentistry students after the DENT70001 course
compared to before the start of the course. The average exercise time before the
DENT70001 course was 57.05 minutes with a standard deviation of 2.5 while after the
DENT70001 course, the average exercise time shot to 75.23 minutes with a standard
deviation of 67.97.
(4.2) Create a new variable “ExeDiff” according to the following: ExeDiff=Exercise2-
Exercise1.
(4.2.1) Calculate summary statistics for this new variable.
Answer
Table 10 below presents the summary statistics for the new variable ExeDiff.
Table 10: Summary Statistics
ExeDiff
N Valid 22
Missing 0
Mean 18.18
Median 12.50
Mode 60
Std. Deviation 62.935
Variance 3960.823
Document Page
Skewness -1.085
Std. Error of Skewness .491
Kurtosis 1.213
Std. Error of Kurtosis .953
Range 240
Minimum -150
Maximum 90
Percentiles
25 -2.50
50 12.50
75 60.00
(4.2.2) Produce a boxplot and histogram for this variable.
Answer
Figure 3: Histogram for ExeDiff
Figure 4: Boxplot for ExeDiff
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
(4.2.3) Write a paragraph (no more than 200 words) to summarize the changes of exercise
pattern PG Dentistry student show since the DENT70001 course started. [4]
Answer
From the summary table (table 10) above, we can see that the average change difference
in the exercise pattern for the PG Dentistry student was 18.18 with the median change
being 12.50. The standard deviation for the change was 62.94. This value is very much
higher than the mean implying that the data on change is very much widely spread out.
The skewness value was found to be -1.085. This value is less than -1 which shows that
the data on change difference is heavily skewed to the left. Both the histogram and the
boxplot further confirm the skewness in the distribution of the change in exercise levels
among the PG Dentistry students. The boxplot further shows that there are a number of
outliers in the dataset. These outliers could possibly be the cause of the skewness in the
differences in the exercise levels.
(4.3) Since the Division of Dentistry cares about student welfare and is keen to know if starting
the DENT70001 course has any impact on student exercise regimes, the HoD would like
to test the following hypothesis: the length of doing exercise has changed among PG
students since DENT70001 started.
(4.3.1) Select a parametric statistical test to test the hypothesis. Write down the null and
alternative hypotheses, conduct the analysis, and write a paragraph regarding your
interpretation.
Answer
The parametric test to help test this hypothesis is the paired t-test. The hypothesis to be
tested is given as follows;
Null hypothesis (H0): There is no significant difference in the average length of doing
exercise before the DENT70001 and after the DENT70001 course.
Alternative hypothesis (HA): There is significant difference in the average length of doing
exercise before the DENT70001 and after the DENT70001 course.
This was tested at 5% level of significance.
The results of the paired t-test is presented below;
Document Page
Table 11: Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 Exercise2 75.2273 22 67.97194 14.49167
Exercise1 57.0455 22 110.99827 23.66491
Table 12: Paired Samples Correlations
N Correlation Sig.
Pair 1 Exercise2 & Exercise1 22 .860 .000
Table 13: Paired Samples Test
Paired Differences t df Sig. (2-
tailed)Mean Std.
Deviation
Std.
Error
Mean
95% Confidence
Interval of the
Difference
Lower Upper
Pair 1 Exercise2 -
Exercise1
18.1818 62.9351 13.4178 -9.7220 46.0857 1.355 21 .190
From table 13 above, we can see that the p-value for the paired t-test is 0.190. This value
is greater than 5% level of significance and as such we fail to reject the null hypothesis
and conclude that there is no significant difference in the average length of doing exercise
before the DENT70001 and after the DENT70001 course. This simply implies that
DENT70001 course has no any significant impact on student exercise regimes and that
the length of doing exercise has not significantly changed among PG students since
DENT70001 started.
(4.3.2) Select a non-parametric statistical test to help the Division Head to answer the same
question. Write down the null and alternative hypotheses, conduct the analysis, and write
a paragraph regarding your interpretation.
Answer
The non-parametric statistical test to be performed to help the Division Head answer the
same question as in (4.3.1) is the Wilcoxon signed-rank test. The hypothesis to be tested
is given as follows;
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]