logo

Predictive Analytics Assignment

14 Pages2198 Words40 Views
   

Added on  2020-04-29

Predictive Analytics Assignment

   Added on 2020-04-29

ShareRelated Documents
Chapter: 1 Predictive analysis
1.1 Introduction
This research is aimed to help the national veteran’s organisation in decision making process,
so that the organisation can better target the donors. If the organisation is able to identify the
most likely donors then it can only target such group and save money and effort on targeting
the whole group. As per the given information, organisation has data base of more than 3.5
million individuals. In this research the identification of the probable donors will be on the
basis of their previous behaviours. Through the analysis a certain group of individuals will be
targeted who have donated recently (12 and 24 months ago).
1.2 Part A: Build decision tree and Regression
based predictive models and predict the donors who
should be solicited for donations.
1) For the analysis purpose the data from the PVA97NK was used
a) As shown in the table below the class level threshold at kept at 2. This has been
done so that the binary variables are taken as the categorical variables.
b) Similarly the table also shows that rejection level count threshold has been set as
100. This will ignore the variable which has more than 100 different values.
Table 1 Results from SAS as the per the given condition
Predictive Analytics Assignment_1
Figure 1 List of variables included in the data set
Figure 2 Results from the descriptive analysis of the Median Income variable
2.
a) What is unusual about the distribution of this variable?
Answer: In this case the target variable is “Median Income Region ( DemMedIncome)” . As
shown the figure above this variable has very skewed distribution. The mode value for the
variable is zero.
Predictive Analytics Assignment_2
b) What could cause this anomaly to occur?
Answer: As discussed in the previous section the median income is zero. However the
median income cannot be zero as the median value is the middle value of the entire data set.
Since the data is related to the probable donors the income of the individuals are expected to
be higher than the average income of the people. So there might be some issues while
collecting the data(Kara 2013).
c) What do you think should be done to rectify the situation?
Answer: Firstly, the sanity check should be done to understand whether the data collection
was done properly or not. In case the zeros are some code for the people, it should be clearly
stated so that it should not confuse the analyst.
3. Use Replacement Node from the Modify tab to replace the value for
DemMedIncometo ‘missing’ only when it equals zero.
Table 2 Replacement node to replace the value of the target variable
4) Data Partition
After the rectification of the income variable the next step was data partition. As the given
information partition of the data set conducted and the data was distributed into training and
test data where each set was allotted 50 % of the original data.
Predictive Analytics Assignment_3
Figure 3 Results from data partition
Target variable exploration:
Table 3 Summary of the data partition
As shown in the table above the data partition was done. There were total 4844 observations
in the data set, out of which 2420 were taken as the training data.
5) Decision Tree
After the partition of the data two different decision tree models were constructed and the
results are shown in the figure below(Fokin & Hagrot 2016).
Predictive Analytics Assignment_4

End of preview

Want to access all the pages? Upload your documents or become a member.