SAS Enterprise and R: Dataset Analysis and Association Rule Project

Verified

Added on 2023/04/21

AI Summary

This project focuses on classifying and clustering a dataset using SAS Enterprise and R code, and applying association rules to the data. The project utilizes an online retail dataset containing information such as invoice numbers, customer IDs, stock codes, descriptions, unit prices, and country. The R code is used for data classification, displaying the data in a tree structure. SAS Enterprise is employed for data access and preparation, and for implementing the association rule mining. The project covers dataset classification using R, clustering techniques, and the application of association rules to identify patterns within the dataset. The results are presented through visualizations and graphical representations, including cumulative lift charts and graphical outputs for clustering analysis and association rule analysis based on support and confidence.

Programming

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Contents
Introduction...........................................................................................................................................2
SAS enterprise and R.............................................................................................................................2
Dataset Classification............................................................................................................................2
Clustering..............................................................................................................................................2
Association Rule....................................................................................................................................3
Conclusion.............................................................................................................................................3
References.............................................................................................................................................3

Introduction
The main goal of project is to classifying and clustering dataset using SAS enterprise and
R code. To implement the association rule or given dataset. Classifying the given dataset
using r code and it display the tree structure. The dataset contains online retail details such as
invoice invoice no, customer id, stock code, description, unit price and country. The
association rule using for same dataset.
SAS enterprise and R
The r code using the SAS enterprise. It using to data access and data preparation. The
SAS enterprise has option for allow the predictive modelling mark-up language in R
packages (Bacardit and Llorà, 2013). The output display on visualization and graphics
format. First import the given dataset for implement the tree. The R code used to classifying
the data and the association rule apply for the given dataset.
Dataset Classification
The R code use to classify the dataset. The online retail dataset import on SAS
enterprise. Then install the packages in r code for create the tree (Celebi, 2016). The next step
to write the r code and run the code. It takes the dataset and it displays the output on tree
format. Consider the parent tree is stock code and chid node is quatity, customer id and unit
price. Finally it classify data based on parent and child node and it display the visualize
output. The r code classifies the two types of data. The data can be divided the two types. The
graph for value for the given dataset. The first dataset has the 400 count and the second data
has the 200 count.
Procedure
Open the SAS enterprise miner and classify the dataset. First import the file using files
import and edit the variable. Then connect the stat explore and run the project. Next connect
the data partition for separate the given dataset. Then drag the control panel button and
connect to data partition. Finally the decision tree implemented using previous state.

Result
First trained the dataset and then it check the dataset validate or not. If the dataset are
valid then it runs the project. It displays the cumulative lift for training dataset and validates
dataset. It calculates the percentage of trained dataset and validates dataset. The graphical
result for display the worth percent of given dataset. In the r code using to train and test the
data.First views the given dataset using the r code.
The user replaces the true statement based on probability. It task only quantity dataset
and unit price dataset. The true statement mention by only quantity dataset and unit price
dataset. The output related to quantity and unit price dataset.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

And then next step to test the given dataset. The trained dataset are unit price and
quantity. The two dataset are tested dataset. It declares the integer value 2 for online retail
dataset. It also tests the true statement. It displays the true statement for unit price and
quantity dataset. Both dataset are trained and tested using r code.

Finally it displays the output in graphical representation. The quantity dataset count and
unit price dataset count will be display in graphically.

Select the rows value using r code. It display the rows from given dataset.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Test and train the online retail dataset. It display the all trained value in given dataset.

Clustering
It is one of the concept f data mining which contain the statistical data analysis. In this
task using the online retail dataset for clustering (CHEN and HUANG, 2011). It also display
the visualize output. It formed the group and it analysis the group data. It analysis the group
data and display the output for every group data. There are different types of clustering

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

implement by using SAS enterprise (Comparing EM Clustering Algorithm with Density
Based Clustering Algorithm Using WEKA Tool, 2016). The types are hierarchical clustering,
centroid clustering, distribution based clustering and density based clustering. In this task
using the hierarchical clustering.
procedure
Open the Rstudio and using r code for import the file. It display the head file and tail file.
It also separates the file and displays the file. Next open the SAS enterprise miner and import
the file. Then drag the cluster button and edit the variable. Run the cluster button and display
the graphical output for given dataset.
Result
Cluster means it formed as a group. It takes the quantity dataset and unitprice.The
segment variable calculate the percentage of the quantity and unit price. It displays the
graphical output for only quantity and unit price dataset. First it display the dataset in r studio.
Then using the command for only display the names. The dataset heading are display. It
display the head of the dataset and tail of the dataset.

The dataset has 500000 dataset. It display the summary for that datset.It display the
particular dataset description in the r studio.

1 out of 21

SAS Enterprise and R: Dataset Analysis and Association Rule Project

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

Data Analysis Project: Classification, Clustering, and Association

Intelligent Systems for Analytics Assignment 3: MITS5509

STATISTICS 12 Assignment: Regression, SVM, and Neural Nets

Analyzing Hybrid Machine Learning Methods: ANN, SVM, and DT

+13062052269

info@desklib.com

SAS Enterprise and R: Dataset Analysis and Association Rule Project

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

Data Analysis Project: Classification, Clustering, and Association

Intelligent Systems for Analytics Assignment 3: MITS5509

STATISTICS 12 Assignment: Regression, SVM, and Neural Nets

Analyzing Hybrid Machine Learning Methods: ANN, SVM, and DT

+13062052269

info@desklib.com