COM00148M: Big Data Analytics Report - Data Analysis and Results

Verified

Added on  2022/08/24

|13
|3914
|13
Report
AI Summary
This report presents an analysis of a dataset containing attributes of diabetic patients from 130 US hospitals between 1999 and 2008. The analysis addresses three research questions: the relationship between lab procedures and hospital admission time, the classification of HbA1c test results based on various factors, and the optimal number of clusters for HbA1c results. The report explores the data, performs classification using different Bayes classifiers, and identifies NaiveBayesMultinomialText as the most accurate. Clustering using k-means and FarthestFirst algorithms reveals optimal clustering with two groups. Linear regression assesses the relationship between hospital time and lab procedures, finding a weak correlation. The results indicate the potential for classifying HbA1c results with other variables, with two clusters being optimal, and show a limited dependency between time spent in the hospital and the number of tests performed. The report also discusses the threats to validity, such as the potential for missing variables and the need for more comprehensive statistical methods.
chevron_up_icon
1 out of 13
circle_padding
hide_on_mobile
zoom_out_icon
Loading PDF…
[object Object]