This project overview explains how to read a delimited file into a data-frame, apply validations, split data into testing and training dataset, implement an algorithm, evaluate model performance, and visualize the model in R. The business problem being solved with this project is not mentioned.
| 2 pages
| 599 words
| 98 views
Trusted by 2+ million users, 1000+ happy students everyday
Introduction to Data Mining: Project Overview 1) Read a delimited file (pipe or comma delimited) into a data-frame. Consider using Hospital Compare data as a data source:https://data.medicare.gov/data/hospital-compare (click on “download csv flat files”)BONUS CREDIT: For bonus credit, create a table or tables in Postgres, populate the table(s) with insert statements, and read the data into a data-frame using R. The DDL and insert statements should be submitted with the assignment. The more elaborate the database, the more bonus credit you are likely to receive (e.g. creating two tables and joining them together is worth more than a single table). 2) Apply some cursory validations (checking for nulls and blanks) and rename your columns if necessary3) Split your data into a testing and training dataset (80% training and 20% testing)Hint: Use “the subset” function in R.3) Using a library, implement an algorithm that we’ve discussed in class using 80% of the data. Model options include:Regression (Linear, Logistic)Naive Bayes (Bernoulli, Multinomial, MLE)Clustering (Hierarchical, k-Means)k-Nearest Neighbors (as a classifier or predictor)TF-IDFOther (approval needed)4) Apply the model to 20% of the data and provide some measure of model performance. Note that for clustering, a testing/training split is not necessary.Z-testConfusion MatrixROC CurveInter-cluster SS (sum of squares)Precision/Recall, Specificity & Sensitivity5) Visualize the model in some way with a simple plot.ScatterplotsCorrelation MatrixHistograms6) A one-paragraph write-up on what business problem is being solved with your project and why the model was selected.BONUS CREDIT: Use R-Shiny to present the data in a browser. The more elaborate the UI (from a functionality and style perspective), the more bonus credit you are likely to receive.Submission Instructions:
Found this document preview useful?
You are reading a preview Upload your documents to download or Become a Desklib member to get accesss