Research Paper: Logistic Regression and Heart Attack Prediction, MCA

Verified

Added on  2021/09/27

|5
|1284
|141
Report
AI Summary
This research paper investigates the application of logistic regression in predicting the possibility of heart attacks. The study begins with an introduction to the problem, highlighting the significance of heart attack prediction and the role of machine learning. It discusses the historical development of logistic regression, tracing its origins and evolution. The core of the paper focuses on a dataset containing various factors related to heart health, such as age, sex, blood pressure, and cholesterol levels. The methodology involves using Python and the sklearn module to implement logistic regression. The model is trained on 70% of the dataset, achieving an accuracy of 79% in predicting heart attack possibilities. The paper provides an analysis of the model's results, discussing its advantages, such as ease of implementation and interpretation, as well as its limitations, including potential overfitting and challenges with non-linear data. The conclusion emphasizes the efficiency of logistic regression for specific datasets and its potential in healthcare, contingent on the collection of efficient data. The report includes references to the dataset used and relevant sources.
Document Page
A Research Paper on
Application of Logistic Regression on Heart Attack Possibility
Submitted to: Submitted by:
Prof. Kamaljit Singh Saini Jitender Kumar
20MCA1050
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
University institute of computing, Chandigarh university, Punjab (Gharuan),
India
Introduction:
People are very much concerned about their health. Heart attack is one of the major reasons
behind the deaths these days. A large population is prone to heart attacks. However, with the
advancements in Machine Learning one can predict the possibility of heart attacks in any
individual. With the study of various factors like age, sex, blood pressure, cholesterol level,
maximum heart rate achieved etc, we can predict the possibility of the heart attack. Logistic
regression is one if the ways to achieve this. Thus, we will discuss whether machine learning
algorithm i.e. Logistic Regression is useful or not or how much useful it is in the area of
Health Care.
The logistic model is used in statistics to model the probability of a particular class or event
existing such as win/lose, pass/fail, dead/alive or healthy/sick. This can be extended to model
several classes of events such as determining whether an image contains a cat, dog, lion, etc
or orange, apple, lemon, etc. Each object being detected in the image would be assigned a
probability between the values 0 and 1. Logistic regression is one of the statistical models
which are being used widely in the field of machine learning. This model uses a logistic
function to model a binary dependent variable.
History:
The name "logistic" was given by Pierre François Verhulst in the 1830s and 1840s as he
developed the logistic regression, under the guidance of Adolphe Quetelet. Earlier, Verhulst
did not specify how he fit the curves to the data. But later he determined the three parameters
of the model by making the curve pass through three observed points which resulted in poor
predictions. The logistic function was rediscovered as a model of population growth in 1920
by Raymond Pearl and Lowell Reed which led to its wide use in modern statistics. The probit
model was developed by Chester Ittner Bliss in 1930s. The probit model influenced the
further development of the logit model and these two models competed with each other.
Earlier logit model was used as an alternative to the probit model in bioassay. It was treated
as inferior to the probit model. But later, it gradually surpassed the probit model. This was
due the use of logit model outside the bioassay as the model gained much popularity. In 1973
Daniel McFadden linked the multinomial logit to the theory of discrete choice,stating that the
multinomial logit followed from the assumption of independence of irrelevant alternatives
Document Page
and interpreting odds of alternatives as relative preferences which gave a theoretical
foundation for the logistic regression.
Research:
We have a dataset on the possibility of the heart attack. This contains various factors which
can be studied and possibility can be predicted with the help of logistic regression.
This dataset contains 76 columns, but all experiments refer to using only important 14 of
them. The Cleveland database is the only one that is used by ML researchers to
this date. The TARGET field refers to the chances of heart disease in the patient. It is integer
valued 0 and 1. 0 means there is no/less probability of heart attack whereas 1 represents there
is high possibility of heart attack.
The attributes are following:
1. Age
2. Sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholesterol in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. old peak = ST depression induced by exercise relative to rest
11. slope of the peak exercise ST segment
12. number of major vessels (0-3) coloured by fluoroscopy
13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect
14. target: 0= less chance of heart attack 1= more chance of heart attack
The following images show some of the distribution of data:
Document Page
Results:
We implemented the logistic regression using python and sklearn module. We divided this
dataset into 70:30 of train and test ratio, trained the model with 70% of dataset. After
training, the testing data when matched with predicted data, it was found that the accuracy of
our model was 79% which is quite acceptable. Thus, we can predict the possibility of a
person whether he is prone to heart attack or not.
Consider a person having age=56, sex=male (0), chest pain type=1, resting blood
pressure=140, serum cholesterol about 150, fasting blood sugar >120(1), resting
electrocardiographic results (2), maximum heart rate achieved about 170, exercise induced
angina is done(1), old peak=1.8, slope of the peak exercise ST segment=2, number of major
vessels=2, thal is normal(0).
We can put these values in our model in order to predict the heart rate possibility.
The model predicted that the person is prone to heart attack.
Analysis of Results:
Logistic Regression is a very efficient model for prediction. However, there are some
challenges that we have to face. Some of these are discussed below. If the number of
observations is lesser than the number of features, Logistic Regression is not suitable to be
used as it may lead to overfitting of the model. Overfitting can result in inaccurate results.
Sometimes it creates linear boundaries which is a disadvantage of the model. Logistic
regression is limited to discrete results i.e. we can use it where we want continuous output. In
real-world scenarios, linearly separable data is rarely found and this model is just used for
linear problems as non-linear problems cannot be solved with this model. Sometimes it is
very difficult to obtain very complex relationships using Logistic Regression.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
However, there are some positive aspects also. Like, it is easy to implement and interpret and
very efficient to train. It can be easily extended to multiple classes i.e. multinomial regression
and a natural probabilistic view of class predictions. It not only provides a measure of how
appropriate a predictor but also its direction of association i.e. positive or negative. Very
efficient and accurate in simple datasets especially where datasets are linearly separable.
Logistic regression is less prone to over-fitting but it can overfit in high dimensional data.
One may consider Regularization techniques to avoid over-fitting in these situations.
Conclusion:
There are many advantages and disadvantages of Logistic Regression model. However, it can
be used with some particular data. If the data is linearly separable, it is efficient to use. Also,
after training we can check its accuracy. If its accuracy is high in some particular data it can
be accepted. This method can be helpful in medical and healthcare if efficient data is
collected.
References:
Dataset Name: Health Care: Dataset on Heart attack possibility. License: Reddit API teams.
Dataset URL: https://www.kaggle.com/nareshbhat/health-care-data-set-on-heart-attack-
possibility
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]