Ask a question from expert

Ask now

Introduction to Data Mining: Project Overview

2 Pages599 Words98 Views
   

Added on  2019-09-19

About This Document

This project overview explains how to read a delimited file into a data-frame, apply validations, split data into testing and training dataset, implement an algorithm, evaluate model performance, and visualize the model in R. The business problem being solved with this project is not mentioned.

Introduction to Data Mining: Project Overview

   Added on 2019-09-19

BookmarkShareRelated Documents
Introduction to Data Mining: Project Overview 1) Read a delimited file (pipe or comma delimited) into a data-frame. Consider using Hospital Compare data as a data source:https://data.medicare.gov/data/hospital-compare (click on “download csv flat files”)BONUS CREDIT: For bonus credit, create a table or tables in Postgres, populate the table(s) with insert statements, and read the data into a data-frame using R. The DDL and insert statements should be submitted with the assignment. The more elaborate the database, the more bonus credit you are likely to receive (e.g. creating two tables and joining them together is worth more than a single table). 2) Apply some cursory validations (checking for nulls and blanks) and rename your columns if necessary3) Split your data into a testing and training dataset (80% training and 20% testing)Hint: Use “the subset” function in R.3) Using a library, implement an algorithm that we’ve discussed in class using 80% of the data. Model options include:Regression (Linear, Logistic)Naive Bayes (Bernoulli, Multinomial, MLE)Clustering (Hierarchical, k-Means)k-Nearest Neighbors (as a classifier or predictor)TF-IDFOther (approval needed)4) Apply the model to 20% of the data and provide some measure of model performance. Note that for clustering, a testing/training split is not necessary.Z-testConfusion MatrixROC CurveInter-cluster SS (sum of squares)Precision/Recall, Specificity & Sensitivity5) Visualize the model in some way with a simple plot.ScatterplotsCorrelation MatrixHistograms6) A one-paragraph write-up on what business problem is being solved with your project and why the model was selected.BONUS CREDIT: Use R-Shiny to present the data in a browser. The more elaborate the UI (from a functionality and style perspective), the more bonus credit you are likely to receive.Submission Instructions:
Introduction to Data Mining: Project Overview_1

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Genetic Algoritham Assignment 2022
|7
|731
|29