logo

BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2

Building and evaluating predictive models using SAS Enterprise Miner for a target marketing case study.

25 Pages4729 Words392 Views
   

Added on  2022-10-01

About This Document

This document explains how to predict customer behavior using decision tree and regression modeling in SAS Enterprise Miner. It covers the roles of variables, setting up the project, and insights into the optimal tree and the most important variables for regression modeling. The objective is to predict which segment of customers are likely to purchase a new line of organic products that is to be introduced by the supermarket.

BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2

Building and evaluating predictive models using SAS Enterprise Miner for a target marketing case study.

   Added on 2022-10-01

ShareRelated Documents
BUS5PA Predictive Analysis
Building and Evaluating
Predictive Models Using SAS
Enterprise Miner
Assignment 2
By
(Name of Student)
(Institutional Affiliation)
Objectives:
To predict and determine, using Decision Tree and Regression modelling,
which segment of customers are likely to purchase a new line of organic
products that is to be introduced by the supermarket.
1. Setting up the project and exploratory analysis.
1.A.1&2: On SAS Enterprise Miner workstation, a new Project named
BUS5PA_Assignment1_19139507 is created, followed by creating a diagram
called Organics. Further, a SAS library is created and the given dataset
‘Organics’ is selected as the data source for the project. On analysing the
dataset, SAS Enterprise Miner found 22223 observations and 13 variables.
The roles of the 13 variables have been set as follows:
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_1
Figure 1.A.2 - Roles & Measurement Level of Variables
Variables with Nominal Measurement Level contain Categorical data while
variables with Interval Measurement Level contain numeric data.
Target/Respond variable TargetBuy has a Binary Measurement, with 1
indicating Yes and 0 indicating No.
1.A.3: Distribution of Target variables [Appendix – Figure 1.A.3 (2)]
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_2
Figure 1.A.3 - Summary of Distribution of TargetBuy
1. A.4: DemCluster has been Rejected as DemClusterGroup contains
collapsed data of DemCluster and based on past evidences,
DemClusterGroup is sufficient for the modelling.
1.B: TargetBuy envelopes the data contained in TargetAmt. Utilizing
TargetAmt as an input could lead to an imprecise modelling or leakage as
the model would find strong co-relation between the input (TargetAmt) and
Target (TargetBuy), since the target variable contains the collapsed data of
TargetAmt. Hence, TargetAmt should not be used as input and should be
set as Rejected.
2.
Decision tree based modelling and analysis.
2.A: After dragging the Organics dataset to the Organics diagram, we
connect the Data Partition node to the Organics dataset. 50% of the data is
utilized for training while the remaining 50% of the data is used for validation
(Appendix – Figure 2.A). Training set is used to build a set of models while
Validation set is utilized to select the best model created from the Training
set.
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_3
Figure 2.A (2) – Adding Data Partition to the Organics data source.
2.B: A Decision Tree is then connected to the Data Partition node (Appendix
– Figure 2.B)
2.C.1: The number of leaves in an Optimal tree is 29 based on Average
Square Error as the subtree assessment plot. This Decision Tree has been
created using Average Square Error (ASE) as the subtree Assessment
Measure (Appendix – Figure 2.C.). The assessment method specifies the type
of method used to select the best tree. ASE opts for the tree that produces
the smallest average square error.
Figure 2.C.1 – Optimal Tree based on Average Square error as the Subtree
Assessment.
2.C.2: Variable DemAge was used for the first split as this is the variable
which ensures the best split in terms of ‘Purity’ (Appendix – Figure 2.C.2).
Based on Logworth of each input variable, the competing splits for the first
split (DemAge) for the first decision tree are DemAffl and DemGender.
Logworth is measure of Entropy, which indicates which variable can create
the most homogenous subgroups.
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_4
Figure 2.C.2 – Logworth of Input Variables
2.D.1: The maximum branches of the second decision tree has been
changed to 3. This means the subsets of the splitting rules are divided into 3
branches (Appendix – Figure 2.D.1).
2.D.2: The second Decision Tree has been created using Average Square
Error (ASE) as the subtree Assessment Measure (Appendix – Figure 2.D.2).
The assessment method specifies the type of method used to select the
best tree. ASE opts for the tree that produces the smallest average square
error.
Figure 2.D.2 (2) – Adding the second Decision Tree node.
2. D.3: The optimal tree for Decision Tree 2 using Average Square Error as
the model assessment statistic contains 33 leaves.
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_5
Figure 2.D.3 – Leaves on Optimal Tree based on Average Square Error for Decision
Tree 2.
The two Decision Tree models differ as the maximum branch splits (2 vs 3) is
different. This results in the divergence in number of leaves in optimal tree of
the respective Decision Tree the first decision tree contains 29 leaves
whereas Decision Tree 2 contains 33 leaves.
The set of rules or classification set by the first Decision Tree can be
summarized as –
Female customers under the Age of 44.5 years, having Affluence grade
more than 9.5 or missing are likely to purchase organic products (Node
36, 37, 38 & 39, Appendix – Figure 2.D.3).
Female customers under the age of 39.5 years, having Affluence grade
of less than 9.5 but more than 6.5 or missing are likely to purchase
organic products (Node 32, Appendix – Figure 2.D.3).
The set of rules or classification set by the second Decision Tree can be
summarized as –
Customers under the age of 39.5 years who have an affluence grade of
more than 14.5 are very likely to purchase organic products [Node 7,
Appendix – Figure 2.D.3 (2)].
Customers under the age of 39.5 years having affluence grade of less
than 14.5 but more than 9.5 (or missing) are likely to purchase organic
products. However, if such a customer is a Female, then she is 22%
more likely to buy Organic products than the customer who is male
with the same attributes [Node 17 & 18, Appendix – Figure 2.D.3 (2)].
2.E: Average square error computes, squares and then averages the
variation between the predicted outcome and the actual outcome of the leaf
nodes. Lower the average square error, better the model; as it indicates the
BUS5PA Predictive Analysis: Building and Evaluating Predictive Models Using SAS Enterprise Miner Assignment 2_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
BUS5PA - Predictive Analytics- Assignment
|31
|3683
|461

BUS5PA : Assignment on Predictive Analytics
|27
|3467
|420

BUS5PA Predictive Analysis Building and Evaluating Assignment 2022
|8
|1405
|30

BUS5PA - Predictive Analytics Assignment
|25
|3024
|434

BUS5PA: Building and Evaluating Predictive Models | Assignment
|25
|2918
|37

Numeracy and Data Analysis: Arrangement, Presentation, and Assessment
|11
|1339
|90