Data Mining and Visualization for Business Intelligence

Verified

Added on 2020/03/23

AI Summary

This assignment focuses on applying data mining techniques to analyze business intelligence scenarios. Students are tasked with interpreting Principal Component Analysis (PCA) results to understand patterns in utility performance data. Furthermore, they analyze customer data from Universal Bank using pivot tables and probability calculations to determine the association between personal loans and online/credit card usage. The objective is to demonstrate an understanding of data mining techniques and their application in business decision-making.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

ITC 516 DATA MINING AND VISUALISATION FOR BUSINESS INTELLIGENCE
Assessment - II
Student Id & Name
[Pick the date]

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Question 1
a) The principal component analysis result is as highlighted below.
Analysis of the PCA matrix
 80% of the variance explanation tendered by first four consecutive principal components
and hence the analysis in the PCA matrix can be reduced to these only.
 The first principal component on the basis of the key parameters (i.e. x1 and x2) seems to
represent the utility financial performance.
 The second principal component on the basis of the key parameters (i.e. x4 and x8) seems
to represent the utility operational performance.
 The third principal component on the basis of the key parameters (i.e. x3 and x7) seems
to represent the utility production cost for electricity.
 The fourth principal component on the basis of the key parameters (i.e. x1 and x3) seems
to represent the utility fixed cost in relation to electricity.
1

Need for Normalization
Normalization need arises when different scales of variables tend to be of significance as this
effect tends to get magnified in PCA owing to emphasis on highest variance. This usually is
reflected in the total variance matrix which highlights the contribution of the different principal
components. However, seeing the same in this case, normalization need prior to PCA does not
arise as no particular principal component has disproportionately contribution leading to
insignificance of the other components.
(b) Advantages and disadvantages of applying PCA are listed below:
Advantages
 Provide estimation about the structure of the data
 Easy to visualize the principal components in m and p dimensional space
 It reduces the risk which can be generated in over-fitting
 The principal components are arranged orthogonally and hence, easier to interpret
Disadvantages/ limitation
 It cannot be used when the data variables are showing non-linear association
 Difficult to examine the exact direction of principal component
 The data which are not summarized through Gaussian distribution cannot be analyzed by
PCA.
 Difficult to find the component with highest variance principal component.
2

Question 2
The data highlighted the different variables which indicate the information about 5000 customers
of Universal Bank.
The main objective of analysis is to find the association between variable “Personal Loan” and
two predictors i.e. “Online” and “Credit Card.”
The analysis has to be run on training data and hence, data partition needs to be performed.
Using XLMiner Analytical Tool, the data of 5000 customers has been divided into 60% training
and 40% validation data. A part of generated result is presented below:
(a) The pivot table for the training data by taking the following row and column variables is
highlighted below:
Row 1 – CC
3

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Row 2 – Loan
Column – Online
Value 0 (Zero) indicates that the customer does not use the respective variable and value 1 (one)
indicates that customer is using the variable.
For example:
CC = 1, It means the customer is using the credit card.
Loan = 0, It means the customer is not going to take loan.
(b) Number of customer who has acquires credit card and online service and will take loan =
( CC =1|Loan=1 ,Online=1 ¿=51
Customers who has acquires credit card and online service ( CC =1|Online=1 )=522
Probability = 51/522 = 0.0978
There is 9.78% probability that a customer who has acquires credit card and online service and
will take loan.
(c) Pivot tables and probabilities computation
Pivot table 1 - for the training data by taking the following row and column variables is
highlighted below:
4

Row – Online
Column – Loan
Pivot table 2 - for the training data by taking the following row and column variables is
highlighted below:
Row – Loan
Column – Credit Card
Probabilities computation
The requisite values are highlighted in the pivot tables.
(i) P ( CC=1|Loan=1 ¿ ¿ ( 93
304 )=0.305
(ii) P ( Online=1|Loan=1 ¿ ¿ ( 183
304 )=0.601
(iii) P ( Loan=1 ) ¿ ( 304
3000 )=0.101
(iv)
P ( CC=1|Loan=0¿
¿ ( 800
2696 )=0.296
5

(v) P ( Online=1|Loan=0 ¿ ¿ ( 1586
2696 )=0.588
(vi) P ( Loan=0 ) ¿ ( 2696
3000 )=0.898
(d) Naïve Bayes Probability i.e.
P ( Loan=1|CC=1 , Online=1 ¿
Naïve Bayes probability= { ( 0.305∗0.601∗0.101 )
( 0.305∗0.601∗0.101 ) + ( 0.296∗0.588∗0.898 ) }=0.106
10.6% is the value of the probability case P ( Loan=1|CC=1 , Online=1 which is also known as
Naïve Bayes Probability.
(e) Issuance of credit card and active use of online services by the customer can be termed as
the best possible strategy for bank loan being offered to the customer.
6