Data Mining Business Case Analysis - Credit & Loan Prediction

Verified

Added on  2020/04/01

|9
|846
|160
Project
AI Summary
This assignment provides a comprehensive analysis of a business case using data mining techniques. The analysis begins with Principal Component Analysis (PCA), identifying key features and determining the need for normalization. It then explores the advantages and disadvantages of PCA. The project further utilizes XLMiner to analyze customer data, focusing on credit card usage, online service utilization, and loan applications. The analysis includes the calculation of probabilities and the design of pivot tables to understand the relationships between these factors. Finally, the assignment computes the Naive Bayes probability to determine the likelihood of a customer taking a loan based on credit card ownership and online service usage. This detailed analysis provides valuable insights into customer behavior and potential loan outcomes.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
DATA MINING
BUSINESS CASE ANALAYSIS
STUDENT NAME
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Question 1
a) The principal component matrix is arguably a critical aspect of the PCA and allows for the
users to identify the useful features by considering the eigen values which is indicated as
follows.
It needs to be determined taking variances contribution of various principal components and thus
decide the number of principal components which are to be considered as part of the analysis.
Further, after narrowing on the significant principal components, five principal components have
been considered assuming the aim to account for 88% of the variance.
Then based on eigen values highlighted in the principal component matrix, the two main features
for each of the principal components have been identified. Consideration needs to be given on
the highest magnitude which would be selected as highlighted below.
PC1 – x1, x2 (Indicating financial aspects)
PC2 – x6, x8 (Indicating operational aspects)
1
Document Page
PC3- x3, x7 (Indicating generation mix and cost)
PC4 –x1, x3 (Indicating fixed cost related aspects)
PC5 – x1, x4 (Indicating financial performance)
Also, it needs to be opined whether normalisation before PCA is required or not. Usually it is
required in order to remove the noise effect produced by the scales whereby the variable having
the largest magnitude of scales also would have the largest absolute value of variance and hence
in PCA also, this one aspect would contribute in a big way to the variance thereby highlighting
the relative non-significance of the other variables. However, considering the PCA results in the
non-normalised form, it may be stated that the need for normalisation does not arise here due to
scale not being too skewed for either of the variables.
(b) The critical disadvantages and advantages are listed below:
Disadvantages
Use to predict the structure of dataset
Use to reduce the large set into fewer dimensional dataset
Minimize the risk factor of over-lapping/fitting of data
Maximize the variances
Represents data orthogonally which helps to examine the principal components
magnitudes
Advantages
Not valid for dataset which have “non-linear relationship”
The distributed principal component in the cloud point does not exhibit the exact direction
because each component is having its own principal axis which is at 900 (right angle) with
2
Document Page
the other component. Therefore, it becomes difficult task to examine the direction of key
significant component.
Not valid for dataset containing categorical scale of measurement.
Question 2
Number of observations = 5000
Partition (Standard partition – XLMiner) = Training: 60% and validation = 40%
Output of XLMiner
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
(a) Pivot table provided two row label which are row label 1as Credit Card (CC) and row
label 2 as Personal label (Loan).
Notation:
Row label 1
CC = 0; It implies that customer does not use credit card (CC)
CC = 1; It implies that customer usages credit card (CC)
Row label 2
Loan = 0; It implies that customer will not decide to take loan offer.
Loan = 1; It implies that customer will decide to take loan offer.
Column label
Online = 0; It implies in the column label that the customer is not utilizing the universal bank’s
online bank service.
4
Document Page
Online = 1; It implies in the column label that the customer is utilizing the universal bank’s
online bank service.
(b) Based on the above notation and pivot table, the probability is computed below:
Probability (Customer, holds credit card, utilize online service, takes loan) ¿ ?
Customer, holds credit card, utilize online service i.e. ( CC =1|Online=1 )=522
Customer, holds credit card, utilize online service, takes loan i.e.
( CC =1|Loan=1 ,Online=1 ¿=51
Probability (Customer, holds credit card, utilize online service, takes loan) ¿( 51/522)=¿0.0978
Thus, there is 0.0978 or 9.78% probability that customer who holds credit card, utilizes online
service will also take loan.
(c) Design of pivot table based on the specified column and row labels and the requisite
probability is shown below:
5
Document Page
First pivot table has online column label and loan row label.
Second pivot table has credit card column label and loan row label.
Proportion or probabilities
S.
No.
Proportion or probabilities Value in pivot table
(i) P ( CC=1|Loan=1 ¿
¿ ( 93
304 )
¿ 0.305
(ii) P ( Online=1|Loan=1 ¿
¿ ( 183
304 )
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
¿ 0.601
(iii) P ( Loan=1 )
¿ ( 304
3000 )
¿ 0.101
(iv) P ( CC=1|Loan=0¿
¿ ( 800
2696 )
¿ 0.296
(v) P ( Online=1|Loan=0 ¿
¿ ( 1586
2696 )
¿ 0.588
(vi) P ( Loan=0 )
¿ ( 2696
3000 )
¿ 0.898
7
Document Page
(d) Based on the above determined quantities, the value of Naïve Bayes Probability would
be computed as shown below:
Naïve Bayes ProbabilityP ( Loan=1|CC=1 , Online=1 ¿
¿ ( 0.3050.6010.101 )
( 0.3050.6010.101 ) + ( 0.2960.5880.898 )
¿ 0.1062
(e) The fulfilment of the following two conditions can lead to maximisation of chances for
loan.
Having a credit card which must have been issued by the same bank
Online services being used actively by the concerned customer
8
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]