Data Mining and Visualization Assessment: PCA & Naive Bayes

Verified

Added on  2019/11/25

|9
|779
|313
Homework Assignment
AI Summary
This document presents a comprehensive solution to a data mining and visualization assessment. The assignment focuses on two key areas: dimension reduction using Principal Component Analysis (PCA) and classification using the Naive Bayes classifier. The PCA section analyzes a dataset, highlighting the variance captured by the principal components and suggesting a reduction to the most significant components. It discusses the advantages and disadvantages of PCA, including considerations for data normalization. The Naive Bayes section analyzes customer data, predicting the probability of a customer taking a personal loan based on online banking usage and credit card ownership. The solution provides detailed calculations of probabilities and demonstrates how to apply the Naive Bayes formula to determine the likelihood of a customer taking a loan offer. The document concludes with recommendations for optimizing loan offers based on the analysis.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Data Mining and Visualization
Assessment : II
Student id
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Mining
Question 1
(Dimension Reduction)
Provided data
Description of variables
(a) By using XLMiner data mining tool, principal component analysis has been run and the
generated output is presented as follows.
Summary output
1
Document Page
Data Mining
PCA Components
2
Document Page
Data Mining
PCA Scores
By analysing the PCA Component table, it can be seen that the variance asscooation of the first
six principal components are caputrued around 95% portion of the total variance. Hence, it can
be suggested to reduce the provided principal component table into a reduced matrix showcasing
only those six pricnipal components.
Reduced principal component matrix
3
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Data Mining
Based on the above shown table, the associated key significant features are given below:
Further, analysis of the PCA matrix can be done to narrow down the above parameters to even
more accurate results.
In regards to normalization of the data, it can be seen that total variance is not dominated by
contribution from a single variable only and it is apparent that the contribution is coming from
other variables also. Hence, normalization of data is not recommended in the given case, as it
makes sense when the individual variance in a particular variable is high which tends to distort
the total variance matrix as it becomes dominated by just one variable.
(b) Major Advantages
Reduction of multidimensional variables into lower dimensional variable
Easy visualization in the cloud point
Easy to interpret the spread of mean
Orthogonal matrix form of matrix would provide freedom to analyze the result
4
Document Page
Data Mining
Direction of principal vector would be recognized easily because each axis is at the right
angle of other respective axis
Major disadvantages
Cannot use for non-linear and complex relations existing between variables
Difficult to examine the actual direction of the principal component vector
Cannot use when the data is obtained from unknown distributions which are unlike Gaussian
Cannot use for “categorical variable”
Question 2
(Naïve Bayes Classifier)
The training data of 2500 customers would be anaylsed by considering the two predictors i.e.
Online and Credit Card with the thrid variable personal loan.
(a) Online - column label,
Credict card (CC) – first row label,
Loan – second row label
Online = 0 (Customer is not utilizing the online banking service),
Online = 1 (Customer is utilizing the online banking service)
5
Document Page
Data Mining
Credict card (CC) = 0 (Customer does not has bank’s credit card)
Credict card (CC) = 1 (Customer is having bank’s credit card)
Loan = 0 (Customer would not take the loan offer)
Loan = 1 (Customer would take the loan offer)
(b) The objective is to determine the probability that a cusomter who is utilizing the online
banking service and also having bank’s credit card would take the loan offer.
Favorable cases = 44
Total cases = 441
Probability = Favorable cases / Total cases = 44/441 = 0.0997
Therefore, the probability that a cusomter who is utilizing the online banking service and also
having bank’s credit card would take the loan offer is 9.97%.
(c) The two pivots tables are show below:
Table 1:
Online - column label,
Loan –row label
Table 2:
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Data Mining
Credit card - column label,
Loan –row label
Computation of following quantities P (A|B)
S. no. Given Probability function Favorable
cases
Total
cases
Favorable
cases / Total
cases
Probability or
proportion
(i) P ( CC=1| Loan=1 ¿ 75 256 75/256 0.292
(ii) P ( Online=1|Loan=1 158 256 158/256 0.617
(iii) P ( Loan=1 ) 256 3000 256/3000 0.0853
(iv) P ( CC=1|Loan=0¿ 652 2244 652/2244 0.2905
(v) P ( Online=1|Loan=0 ¿ 1336 2244 1336/2244 0.5953
(vi) P ( Loan=0 ) 2244 2500 2244/2500 0.8976
(d) On the basis of the quantities determined in part ©, the Naïve Bayes Probability can be
determined and is shown below:
7
Document Page
Data Mining
Favorable cases = (0.292)*(0.617)*(0.0853) = 0.01536
Total cases = (0.292)*(0.617)*(0.0853) + (0.2905)*(0.5953)*(0.8976) = 0.17059
Probability = Favorable cases / Total cases
¿ 0.01536
0.17059
¿ 0.09004
Hence, the value of Naïve Bayes Probability has been determined as 9.004%.
(e) The chances of the customer for personal loan being offered can be optimized by
ensuring the following two steps.
Active usage of online services
Possession of credit card issued by the bank
8
chevron_up_icon
1 out of 9
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]