Data Mining Business Case Analysis - Credit & Loan Prediction

Verified

Added on 2020/04/01

AI Summary

This assignment provides a comprehensive analysis of a business case using data mining techniques. The analysis begins with Principal Component Analysis (PCA), identifying key features and determining the need for normalization. It then explores the advantages and disadvantages of PCA. The project further utilizes XLMiner to analyze customer data, focusing on credit card usage, online service utilization, and loan applications. The analysis includes the calculation of probabilities and the design of pivot tables to understand the relationships between these factors. Finally, the assignment computes the Naive Bayes probability to determine the likelihood of a customer taking a loan based on credit card ownership and online service usage. This detailed analysis provides valuable insights into customer behavior and potential loan outcomes.

DATA MINING
BUSINESS CASE ANALAYSIS
STUDENT NAME
[Pick the date]

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Question 1
a) The principal component matrix is arguably a critical aspect of the PCA and allows for the
users to identify the useful features by considering the eigen values which is indicated as
follows.
It needs to be determined taking variances contribution of various principal components and thus
decide the number of principal components which are to be considered as part of the analysis.
Further, after narrowing on the significant principal components, five principal components have
been considered assuming the aim to account for 88% of the variance.
Then based on eigen values highlighted in the principal component matrix, the two main features
for each of the principal components have been identified. Consideration needs to be given on
the highest magnitude which would be selected as highlighted below.
PC1 – x1, x2 (Indicating financial aspects)
PC2 – x6, x8 (Indicating operational aspects)
1

PC3- x3, x7 (Indicating generation mix and cost)
PC4 –x1, x3 (Indicating fixed cost related aspects)
PC5 – x1, x4 (Indicating financial performance)
Also, it needs to be opined whether normalisation before PCA is required or not. Usually it is
required in order to remove the noise effect produced by the scales whereby the variable having
the largest magnitude of scales also would have the largest absolute value of variance and hence
in PCA also, this one aspect would contribute in a big way to the variance thereby highlighting
the relative non-significance of the other variables. However, considering the PCA results in the
non-normalised form, it may be stated that the need for normalisation does not arise here due to
scale not being too skewed for either of the variables.
(b) The critical disadvantages and advantages are listed below:
Disadvantages
 Use to predict the structure of dataset
 Use to reduce the large set into fewer dimensional dataset
 Minimize the risk factor of over-lapping/fitting of data
 Maximize the variances
 Represents data orthogonally which helps to examine the principal components
magnitudes
Advantages
 Not valid for dataset which have “non-linear relationship”
 The distributed principal component in the cloud point does not exhibit the exact direction
because each component is having its own principal axis which is at 900 (right angle) with
2

the other component. Therefore, it becomes difficult task to examine the direction of key
significant component.
 Not valid for dataset containing categorical scale of measurement.
Question 2
Number of observations = 5000
Partition (Standard partition – XLMiner) = Training: 60% and validation = 40%
Output of XLMiner
3

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

(a) Pivot table provided two row label which are row label 1as Credit Card (CC) and row
label 2 as Personal label (Loan).
Notation:
Row label 1
CC = 0; It implies that customer does not use credit card (CC)
CC = 1; It implies that customer usages credit card (CC)
Row label 2
Loan = 0; It implies that customer will not decide to take loan offer.
Loan = 1; It implies that customer will decide to take loan offer.
Column label
Online = 0; It implies in the column label that the customer is not utilizing the universal bank’s
online bank service.
4

Online = 1; It implies in the column label that the customer is utilizing the universal bank’s
online bank service.
(b) Based on the above notation and pivot table, the probability is computed below:
Probability (Customer, holds credit card, utilize online service, takes loan) ¿ ?
Customer, holds credit card, utilize online service i.e. ( CC =1|Online=1 )=522
Customer, holds credit card, utilize online service, takes loan i.e.
( CC =1|Loan=1 ,Online=1 ¿=51
Probability (Customer, holds credit card, utilize online service, takes loan) ¿( 51/522)=¿0.0978
Thus, there is 0.0978 or 9.78% probability that customer who holds credit card, utilizes online
service will also take loan.
(c) Design of pivot table based on the specified column and row labels and the requisite
probability is shown below:
5

First pivot table has online column label and loan row label.
Second pivot table has credit card column label and loan row label.
Proportion or probabilities
S.
No.
Proportion or probabilities Value in pivot table
(i) P ( CC=1|Loan=1 ¿
¿ ( 93
304 )
¿ 0.305
(ii) P ( Online=1|Loan=1 ¿
¿ ( 183
304 )
6

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

¿ 0.601
(iii) P ( Loan=1 )
¿ ( 304
3000 )
¿ 0.101
(iv) P ( CC=1|Loan=0¿
¿ ( 800
2696 )
¿ 0.296
(v) P ( Online=1|Loan=0 ¿
¿ ( 1586
2696 )
¿ 0.588
(vi) P ( Loan=0 )
¿ ( 2696
3000 )
¿ 0.898
7

(d) Based on the above determined quantities, the value of Naïve Bayes Probability would
be computed as shown below:
Naïve Bayes ProbabilityP ( Loan=1|CC=1 , Online=1 ¿
¿ ( 0.305∗0.601∗0.101 )
( 0.305∗0.601∗0.101 ) + ( 0.296∗0.588∗0.898 )
¿ 0.1062
(e) The fulfilment of the following two conditions can lead to maximisation of chances for
loan.
 Having a credit card which must have been issued by the same bank
 Online services being used actively by the concerned customer
8