logo

Introduction to Data Mining Project

   

Added on  2019-09-19

2 Pages599 Words98 Views
Data Science and Big Data
 | 
 | 
 | 
Introduction to Data Mining: Project Overview 1) Read a delimited file (pipe or comma delimited) into a data-frame. Consider using Hospital Compare data as a data source:https://data.medicare.gov/data/hospital-compare (click on “download csv flat files”)BONUS CREDIT: For bonus credit, create a table or tables in Postgres, populate the table(s) with insert statements, and read the data into a data-frame using R. The DDL and insert statements should be submitted with the assignment. The more elaborate the database, the more bonus credit you are likely to receive (e.g. creating two tables and joining them together is worth more than a single table). 2) Apply some cursory validations (checking for nulls and blanks) and rename your columns if necessary3) Split your data into a testing and training dataset (80% training and 20% testing)Hint: Use “the subset” function in R.3) Using a library, implement an algorithm that we’ve discussed in class using 80% of the data. Model options include:Regression (Linear, Logistic)Naive Bayes (Bernoulli, Multinomial, MLE)Clustering (Hierarchical, k-Means)k-Nearest Neighbors (as a classifier or predictor)TF-IDFOther (approval needed)4) Apply the model to 20% of the data and provide some measure of model performance. Note that for clustering, a testing/training split is not necessary.Z-testConfusion MatrixROC CurveInter-cluster SS (sum of squares)Precision/Recall, Specificity & Sensitivity5) Visualize the model in some way with a simple plot.ScatterplotsCorrelation MatrixHistograms6) A one-paragraph write-up on what business problem is being solved with your project and why the model was selected.BONUS CREDIT: Use R-Shiny to present the data in a browser. The more elaborate the UI (from a functionality and style perspective), the more bonus credit you are likely to receive.Submission Instructions:
Introduction to Data Mining Project_1

End of preview

Want to access all the pages? Upload your documents or become a member.