Data Mining and Visualization for Business Intelligence Assignment II

Verified

Added on  2020/03/23

|8
|781
|55
Homework Assignment
AI Summary
This document presents a comprehensive solution to a Data Mining and Visualization assignment. The first part analyzes utilities data using Principal Component Analysis (PCA), discussing the selection of significant features based on eigenvalues and the advantages and disadvantages of the PCA method. The second part focuses on business intelligence, using training data to calculate probabilities related to credit cards, online banking services, and personal loan offers. It utilizes pivot tables to determine the probability of customers accepting loan offers based on their credit card usage and online banking behavior, calculating Naive Bayes probability to identify the most optimal customer profile for loan offers. The solution demonstrates the application of data mining techniques to extract valuable insights for business decision-making.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data Mining and Visualization for Business Intelligence
Assignment - II
Student id
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 1
(a)The Xl Miner output for the utilities data is shown below.
In the above output, the principal component matrix tends to highlight the eigen values which
have been obtained for the various features corresponding to each of the principal components.
This is essential in shortlisting the critical features taking magnitude of the coefficients under
consideration. Thus, for principal component 1, the features that would be considered the most
significant would be x1 and x2 owing to these coefficients being the two most highest. Hence,
for any given principal components, the identification of key utilities parameter may be
undertaken. It is noteworthy that these features collectively a particular aspect of the functioning
of utilities thereby allowing understanding of the relative importance.
1
Document Page
A key observation in case of PCA is that the scales for representation of various variables would
be different and hence the absolute variance value would vary. This in certain cases can provide
an intrinsic advantage to the variables having high values as the importance of these variables
tends to be overestimated in the total variance matrix. However, this is not evident here where
the principal component 1 has a representation less than 30% which augers well. Further,
normalising the utilities data did not yield any better results and hence it may be said that
normalisation is not required here.
(b)
Advantages of PCA method:
Reduction method for minimizing the larger number of variables into smaller number of
variables which are known as principal components.
PCA provide detailed description regarding the image/structure of the variable set
Recommendable when variables highlight linear relations
It is based on max- variance technique and hence the focus is to maximize the variance of
variables and through the variables with least variance which ensures easier identification of
critical components.
The magnitude of the new components is easy to find because the resultant PCA summary
contains orthogonal matrix.
Disadvantages of PCA method:
PCA is useful only when the aim is to determine the magnitude of the component because it
cannot provide the actual estimation about the direction of the components due to the
complex distribution in the dimensional cloud.
2
Document Page
This technique may cause elimination of the important decision variables while reducing the
variables by considering their variance only. However, the elimination of variables is having
significant importance in the analysis process.
This technique cannot be useful for the following case scenarios.
Data are from unknown source
Mean and variance are un-defined
Categorical variables are present in the data set
Any other distribution unrelated to Gaussian
Question 2
In order to perform the various analyses, training data has been taken from the “resource.”
Hence, it is not essential to make any partition for the data because the data as per standard
partition of 60% training set.
List of predictors
Credit card of bank
Online bank service of bank
(a) The below highlighted pivot table has two rows (Credit card as first one and loan as
second one) and online as column.
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Variable = 1, Indicates YES
Variable = 0: Indicates NO
(b) Probability (Customer owns credit card, use online service, accept offer of personal loan)
=?
From the pivot table of part (A), it can be said that there are 51 customers who has credit card
and would accept the offer of personal loan and also subscribing and using online service.
However, the total cases where the customers are using credit card for transaction and online
service of banking. = 522
Probability = 51 / 522 = 0.0977
(c) The below presented table indicates the pivot tables and value of quantities that needs to
be determined.
Table 1: Row Loan, Column Credit Card.
4
Document Page
Table 1: Row Loan, Column Online.
Probability computation
5
Document Page
(d) Naïve Bayes Probability would be determined by taking the probability values
highlighted above.
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Positive case = ( 0.3050.6010.101 ) = 0.0185
Total possible cases = ( 0.3050.6010.101 )+ ( 0.2960.5880.898 ) = 0.174
Naïve Bayes Probability = Positive case / Total possible cases
= 0.0185 / 0.174 = 10.16%
(e) Considering the higher conditional probability for obtaining loan when the customers
actively use online services and also have the credit card issued from the bank, it is
apparent that this would constitute the most optimum profile of the customer who wishes
to be offered loan.
7
chevron_up_icon
1 out of 8
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]