Report on Statistical Analysis of Smoking, Age, and Survival Rates
VerifiedAdded on 2022/11/14
|7
|1304
|202
Report
AI Summary
This report presents a statistical analysis examining the association between smoking, age, and 20-year survival rates. The study utilizes a dataset with variables including age category, smoking status, and 20-year survival outcomes. Methods employed include exploratory data analysis with charts and statistical tests like logistic regression and Pearson chi-square tests. The findings reveal a significant correlation between smoking status and survival, indicating that smokers face a higher risk of mortality within the 20-year timeframe. The report also highlights the influence of age on survival rates, with older individuals showing a higher incidence of death. The results underscore the detrimental effects of smoking on health and emphasize the need for public health interventions to mitigate smoking-related risks and promote healthy aging. The report concludes with a discussion of the study's implications and recommendations for future research.

Question 1
In this case, Y was body fat percentage (%BF) while X1 was body mass index (BMI), and X2
was gender. The first fitted model was y=1.6597x-3.916. This indicated that the %BF increased
by 1.6597 units per unit increase in BMI. The model had R-squared of 74.83%. This implied that
74.83% variability of the model was explained by BMI and the rest variability was explained by
other factors in the model (Bock & Diday, 2012).
The second fitted model was y=1.6838x-21.759 this had the interpretation that the %BF for male
was 1.6838 more than that of the female. Additionally, the model had R-squared of 88.41 which
implied that greater variability of the model was explained by gender (Bock & Diday, 2012).
The chart indicated that there was a difference in %BF between genders. The scatterplot
indicated that their %BF for male and female was uncorrelated. It was also clear that the %BF
for females was higher than that of males based on their BMI.
Question 2
1.
The chart below represents the association between 20-year survival time and smoking status.
In this case, Y was body fat percentage (%BF) while X1 was body mass index (BMI), and X2
was gender. The first fitted model was y=1.6597x-3.916. This indicated that the %BF increased
by 1.6597 units per unit increase in BMI. The model had R-squared of 74.83%. This implied that
74.83% variability of the model was explained by BMI and the rest variability was explained by
other factors in the model (Bock & Diday, 2012).
The second fitted model was y=1.6838x-21.759 this had the interpretation that the %BF for male
was 1.6838 more than that of the female. Additionally, the model had R-squared of 88.41 which
implied that greater variability of the model was explained by gender (Bock & Diday, 2012).
The chart indicated that there was a difference in %BF between genders. The scatterplot
indicated that their %BF for male and female was uncorrelated. It was also clear that the %BF
for females was higher than that of males based on their BMI.
Question 2
1.
The chart below represents the association between 20-year survival time and smoking status.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The bar chart indicates that those who were smoking many were alive compared to those who
were dead. Similarly, the non-smokers who were alive were more than those who were dead.
Comparing the 20-year survival for the smokers and non-smokers, more non-smokers were dead
than the smokers who were dead.
The figure represents a bar chart that indicates the relationship between the count of participants
in the survey and their corresponding age-categories
were dead. Similarly, the non-smokers who were alive were more than those who were dead.
Comparing the 20-year survival for the smokers and non-smokers, more non-smokers were dead
than the smokers who were dead.
The figure represents a bar chart that indicates the relationship between the count of participants
in the survey and their corresponding age-categories

The chart indicated that there were a few people who had less age who participated in the survey.
The number of people who participated in the survey increased as the age category increased.
From the chart also, the number of deaths were highest those who were in the age category of
65+ years.
2.
To investigate whether the 20-year old survival was associated with the smoking status a simple
logistic regression model was fitted and below is the output of the model.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
The number of people who participated in the survey increased as the age category increased.
From the chart also, the number of deaths were highest those who were in the age category of
65+ years.
2.
To investigate whether the 20-year old survival was associated with the smoking status a simple
logistic regression model was fitted and below is the output of the model.
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Step 1a
Smoking -.379 .126 9.076 1 .003 .685
Constant -.781 .080 96.092 1 .000 .458
a. Variable(s) entered on step 1: Smoking.
The model indicated that the odds of dying on the 20-year old of those who were smoking was -
0.379 more than those who were non-smokers. The p-value for the variable was 0.03 which was
less than 0.5 level of significance this indicated that the smoking status was statistically
significant in explaining the 20-year old survival (Bock & Diday, 2012).
3.
A Pearson chi-square test was carried out to determine whether there was an association between
smoking status and 20-year old survival. The following hypothesis was used for the test.
H0: there is no association between 20-year old survival and the smoking status
Versus
H1: there is an association between 20-year old survival and the smoking status
4.
The test had a p-value of 0.03 which was less than 0.05 level of significance and 1 degree of
freedom. Hence the null hypothesis was rejected. This lead to the conclusion that there was an
association between the 20-year old survival time and smoking status (Bock & Diday, 2012).
5.
The test carried out indicated that the 20-year old survival time was dependent on the smoking
status of the participant. This implied that the 20-year old survival for the time between smokers
and non-smokers was dependent on each other. This was also evident in the bar chart that used
previously which indicated that the survival time of a person was dependent on their smoking
status.
6.
Smoking -.379 .126 9.076 1 .003 .685
Constant -.781 .080 96.092 1 .000 .458
a. Variable(s) entered on step 1: Smoking.
The model indicated that the odds of dying on the 20-year old of those who were smoking was -
0.379 more than those who were non-smokers. The p-value for the variable was 0.03 which was
less than 0.5 level of significance this indicated that the smoking status was statistically
significant in explaining the 20-year old survival (Bock & Diday, 2012).
3.
A Pearson chi-square test was carried out to determine whether there was an association between
smoking status and 20-year old survival. The following hypothesis was used for the test.
H0: there is no association between 20-year old survival and the smoking status
Versus
H1: there is an association between 20-year old survival and the smoking status
4.
The test had a p-value of 0.03 which was less than 0.05 level of significance and 1 degree of
freedom. Hence the null hypothesis was rejected. This lead to the conclusion that there was an
association between the 20-year old survival time and smoking status (Bock & Diday, 2012).
5.
The test carried out indicated that the 20-year old survival time was dependent on the smoking
status of the participant. This implied that the 20-year old survival for the time between smokers
and non-smokers was dependent on each other. This was also evident in the bar chart that used
previously which indicated that the survival time of a person was dependent on their smoking
status.
6.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The table indicated that there was an association between the survival time and age category of
the person. The table indicated that the number of people who died increased with an increase in
years.
7.
The conclusion made in the above part is similar to the relationship that was indicated in the bar
chart that indicated that the 20-year old survival status increased with the increase in years. The
overall test for logistic regression indicated that there was an association between 20-year old
survival time and smoking status.
8.
Introduction
Over the past year, there have many diseases that are associated with smoking. The numerous
studies that have been conducted about smoking indicate that smoking is harmful to both active
and passive smokers (Maydeu-Olivares & Joe, 2014). There has been a lot of death that are
associated with smoking. Furthermore, smoking varies with the age of a person.
The objective of this paper is to investigate whether smoking is associated with the death of the
person depending on a persons’ age. The paper also aims to investigate whether there is any
relationship between the age of a person and their smoking status.
Materials and methods
The data used in this study was collected using questionnaires. There were four variables in the
dataset which included age category, smoking status, 20-year old survival, and count.
Additionally, there were 24 observations. Simple random sampling was used for collecting the
participant in the survey to ensure that each observation had equal chances of being represented
by the sample (Maydeu-Olivares & Joe, 2014).
the person. The table indicated that the number of people who died increased with an increase in
years.
7.
The conclusion made in the above part is similar to the relationship that was indicated in the bar
chart that indicated that the 20-year old survival status increased with the increase in years. The
overall test for logistic regression indicated that there was an association between 20-year old
survival time and smoking status.
8.
Introduction
Over the past year, there have many diseases that are associated with smoking. The numerous
studies that have been conducted about smoking indicate that smoking is harmful to both active
and passive smokers (Maydeu-Olivares & Joe, 2014). There has been a lot of death that are
associated with smoking. Furthermore, smoking varies with the age of a person.
The objective of this paper is to investigate whether smoking is associated with the death of the
person depending on a persons’ age. The paper also aims to investigate whether there is any
relationship between the age of a person and their smoking status.
Materials and methods
The data used in this study was collected using questionnaires. There were four variables in the
dataset which included age category, smoking status, 20-year old survival, and count.
Additionally, there were 24 observations. Simple random sampling was used for collecting the
participant in the survey to ensure that each observation had equal chances of being represented
by the sample (Maydeu-Olivares & Joe, 2014).

Logistic regression was used for fitting the model to investigate the relationship between the
response and the predictor variables. This was necessary because the response was a binary
variable. Chi-square test was performed to investigate whether there was an association between
20-year old survival time and smoking status.
Results
The charts that were used for EDA indicated how the variables were associated with each other.
The second figure in question 1 indicated how the 20-year old was highly related to age. More
death was among older people than younger people. The correlation test that was carried out had
a p-value<0.05 which indicated that the 20-year old survival time was dependent on the smoking
status of the person (Maydeu-Olivares & Joe, 2014). The logistic regression model that was
fitted for the data indicated that smokers were at higher risk of dying compared to the non-
smokers. Generally, there was an association between the three variables in the dataset.
Discussion
Finally, based on the above analysis and explanatory data analysis was performed it was clear
that there are various factors that are associated with the 20-year old survival time. Smoking
status of a person and age category were closely associated with the 20-year old survival time.
This implies that those who smoke should be made aware that their behaviors are associated with
the death at 20-year old survival. Further, measures should be taken on how to reduce the
number of non-smokers who die at old age. This implied that old age death was not associated
with other variables that were not included in the study.
References
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data
analysis. Multivariate Behavioral Research, 49(4), 305-328.
response and the predictor variables. This was necessary because the response was a binary
variable. Chi-square test was performed to investigate whether there was an association between
20-year old survival time and smoking status.
Results
The charts that were used for EDA indicated how the variables were associated with each other.
The second figure in question 1 indicated how the 20-year old was highly related to age. More
death was among older people than younger people. The correlation test that was carried out had
a p-value<0.05 which indicated that the 20-year old survival time was dependent on the smoking
status of the person (Maydeu-Olivares & Joe, 2014). The logistic regression model that was
fitted for the data indicated that smokers were at higher risk of dying compared to the non-
smokers. Generally, there was an association between the three variables in the dataset.
Discussion
Finally, based on the above analysis and explanatory data analysis was performed it was clear
that there are various factors that are associated with the 20-year old survival time. Smoking
status of a person and age category were closely associated with the 20-year old survival time.
This implies that those who smoke should be made aware that their behaviors are associated with
the death at 20-year old survival. Further, measures should be taken on how to reduce the
number of non-smokers who die at old age. This implied that old age death was not associated
with other variables that were not included in the study.
References
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data
analysis. Multivariate Behavioral Research, 49(4), 305-328.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Bock, H. H., & Diday, E. (Eds.). (2012). Analysis of symbolic data: exploratory methods for
extracting statistical information from complex data. Springer Science & Business
Media.
extracting statistical information from complex data. Springer Science & Business
Media.
1 out of 7

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.