Data Analyst of Heart Attack: Descriptive, Predictive & Prescriptive Analysis
The analysis of heart attack risk is important for it could help reduce the chances of getting a heart attack or related fatality. Myocardial infarction or MI (heart attack) is a serious and life-threatening health condition where the blood follows in the heart is blocked. Like any other disease, MI has some symptoms, including pressure, shortness of breath, cold sweat, fatigue, and dizziness. However, sometimes, this goes unnoticed, leading to complications (Merritt, De Zoysa and Hutton, 2017).
Descriptive analysis
The descriptive analysis is performed to understand the characteristics of the data collected. The numerical variables are first explored.
The summary indicates that the average age of the participants is 54.366 years and a median of 55.00 years. The middle 50% of the participants were aged between 47.50 years and 61.00 years. On average, the resting blood pressure is 131.62 (mmHg), and the cholesterol level is 246.264. The participants average maximum heart rate achieved was 149.6469 with a minimum of 71 and a maximum of 202. Some of the visual illustrations of the numerical data are illustrated below;
The histograms show that age and resting blood pressure are bell-shaped distributed, with blood pressure having a relatively longer tail to the right. This shows that it is positively skewed.
The box plot illustrates that although the two variables have a bell-shaped distribution, there is some skewness. For example, the cholesterol level is positively skewed with a few outliers on the upper side, whereas the maximum heart rate achieved is negatively skewed.
The data has 96 females and 207 males, and 204 participants have exercise-induced angina. Also, 258 participants have fasting blood sugar greater than 120 mg/dl.
Predictive Analysis
A binary logistic model is fitted on the data with the aim of developing a model that could help predict the likelihood of developing a heart attack. The response variable was the output, and the independent variables or predictors all the other variables. The fitted model is summarized below.
The results indicate that some of the predictors were significant, whereas others were not. For instance, age and fasting blood sugar were not significant. All the other variables were significant at the level 0.05 except for the which was significant at the level 0.10. The model is used to predict whether an individual will have a heart attack or not. The model’s accuracy is tested using the test data set, and the results are tabulated below.
The results show that the model’s accuracy was at 81.97%, with a sensitivity of 82.05% and specificity of 81.82%. Note that there is a balance between sensitivity and specificity. However, the kappa value deviates much from 1, which shows the accuracy of the model is in question. In this case, any person with a probability greater than 0.50 is classified as at risk of a heart attack. Thus this threshold could be checked, and if possible, amended.
Prescriptive Analysis
The classification and particularly decision trees are plotted to prescribe what to look at to avoid MI situation. This would help identify those at higher risk of heart attack and those at low risk of a heart attack.
The results indicate that chest pain type is the first factor that patients or individuals should consider checking. Those with chest pains are more likely to have a heart attack if they have an age higher than 57 years. The males with chest pains are 22% more likely to have a heart attack, whereas those with cholesterol level higher than 272 are 13% more likely to experience MI.
A person with chest pains, female, with cholesterol higher than 272m and equal to 1 has a 10% of getting a heart attack. Last but not least, those without chest pains but has a Thal rate greater o equal to 3, and one or more major vessels has a 20% chance of getting a heart attack.
Thus, there are a number of factors to look for to reduce chances of heart attack, first is the chest pains, those aged above 57 years, males and cholesterol level.
This model adequacy is checked using the confusion matrix, and the results are tabulated below;
The accuracy of this model is at 86.89%, which is higher than the logistic model. The sensitivity is slightly lower than specificity. However, the performance of this model is better than the binary logistic model since it has a larger kappa value of 0.7221.
The code source from this project can be found from;
Merritt, C. J., De Zoysa, N., & Hutton, J. M. (2017). A qualitative study of younger men’s experience of heart attack (myocardial infarction). British Journal of Health Psychology, 22(3), 589–608.
The dataset is obtained from;
Thank you for reading!!!