SIT717: Clustering Analysis of BBC Health Twitter Data Report

Verified

Added on  2022/12/15

|19
|3805
|431
Report
AI Summary
This report presents a clustering analysis of BBC Health Twitter data, focusing on the application of data mining techniques to classify and segment health-related tweets. The analysis utilizes the bbchealth.txt dataset, sourced from the UCI machine learning repository, containing tweet IDs, timestamps, and health-related content. Due to limitations, the analysis primarily uses tweet IDs and timestamps. The core methodology involves the K-means clustering algorithm, implemented in Weka, to segment the data into clusters based on time-based criteria. The report details the data preprocessing steps, the rationale behind choosing K-means, and a comparison with the fuzzy c-means algorithm. The results section presents the clustered model based on minimum sum of squared error and model building time. The report also includes a discussion on the dataset summary, data mining techniques, results, evaluation and conclusions.
chevron_up_icon
1 out of 19
circle_padding
hide_on_mobile
zoom_out_icon
Loading PDF…
[object Object]